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Abstract 

Researchers  rely  on  the  mathematics  of  Vapnik  and  Chervonenkis  to  capture 
quantitatively  the  capabilities  of  specific  artificial  neural  network  (ANN)  architec¬ 
tures.  The  quantifier  is  known  as  the  V-C  dimension,  and  is  defined  on  functions  or 
sets.  Its  value  is  the  largest  cardinality  /  of  a  set  of  vectors  in  such  that  there  is 
at  least  one  set  of  vectors  of  cardinality  /  such  that  all  dichotomies  of  that  set  into 
two  sets  can  be  implemented  by  the  function  or  set.  Stated  another  way,  the  V-C 
dimension  of  a  set  of  functions  is  the  largest  cardinality  of  a  set,  such  that  there 
exists  one  set  of  that  cardinality  which  can  be  shattered  by  the  set  of  functions.  A 
set  of  functions  is  said  to  shatter  a  set  if  each  dichotomy  of  that  set  can  be  imple¬ 
mented  by  a  function  in  the  set.  There  is  an  abundance  of  research  on  determining 
the  value  of  V-C  dimensions  of  ANNs.  In  this  document,  research  on  V-C  dimension 
is  refined  and  extended  yielding  formulas  for  evaluating  V-C  dimension  for  the  set 
of  functions  representable  by  a  feed-forward,  single  hidden-layer  perceptron  artificial 
neural  network. 

The  fundamental  thesis  of  this  research  is  that  the  V-C  dimension  is  not  an 
appropriate  quantifier  of  ANN  capabilities.  Consequently,  the  results  of  this  research 
provide  a  basis  of  mathematics  on  which  to  build  quantifiers  that  address  the  specifics 
of  ANN’s  ability  based  on  invariant  characterizations  of  signed  sets.  Specifically,  the 
lattice  structure  of  ANNs  is  investigated.  Moreover,  a  cut-intersection  semi-lattice  is 
established  upon  which  invariant  analysis  of  an  arrangement  of  hyperplanes  can  be 
examined.  As  a  consequence  of  the  study  of  combinatorial  geometry  of  hyperplane 
arrangements,  it  is  shown  that  solutions  to  the  chamber  counting  problem  that  are 
based  on  analysis  of  the  Poincare  polynomial  also  provide  a  closed  form  relation  for 
determining  the  value  of  the  V-C  dimension  of  ANNs.  This  provides  a  relationship 


IX 


between  the  study  of  combinatorial  geometry  of  hyperplane  arrangements  and  ANN 
capability  analysis. 

In  addition,  a  generalized  framework  in  which  to  perform  ANN  capabilities 
analysis  is  presented.  The  framework  is  based  on  invariant  analysis.  Moreover, 
an  invariant  based  on  geometric  complexity  defined  by  concepts  of  combinatorial 
geometry  is  presented  and  evaluated. 

Finally,  an  instantiation  of  the  framework  is  given.  The  quantifier  is  called 
the  Ox-Cart  dimension.  It  is  a  function  of  an  invariant  called  the  geometric  com¬ 
plexity.  This  quantifier  is  directed  at  analysis  of  specific  geometric  arrangements  of 
signed  sets.  In  other  words,  0-C  dimension  characterizes  an  ANN’s  ability  to  solve 
a  particular  classification  problem  where  the  problem  is  characterized  by  its  geo¬ 
metric  complexity.  This  differs  from  V-C  dimension  which  is  about  arbitrary  sets. 
V-C  dimension  characterizes  an  ANN’s  ability  to  solve  the  worst  case  geometry  and 
dichotomy  of  a  classification  problem  characterized  only  be  cardinality.  Using  the 
Ox- Cart  dimension  a  lattice  of  feed-forward,  single  hidden-layer,  perceptron  artificial 
neural  network  is  defined.  This  structure  is  shown  to  facihtate  ANN  architecture 
capabilities  comparisons. 
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THE  MATHEMATICS  OF  MEASURING  CAPABILITIES  OF 
ARTIFICIAL  NEURAL  NETWORKS 


I.  Introduction 

The  research  described  in  this  document  is  dedicated  to  improving  the  meth¬ 
ods  and  mathematics  used  to  determine  capabilities  of  feed-forward  artificial  neural 
networks  (ANNs).  A  generalized  framework  for  deriving  ANN  capability  quantifiers 
is  developed.  In  order  to  provide  the  framework,  the  mathematical  structure  of 
the  problem,  generalized  solution,  and  methods  of  analyzing  both  had  to  be  clearly 
defined  and  investigated.  Invariant  analysis  is  used  extensively  to  define  the  gener¬ 
alized  framework.  As  a  result,  a  capability  quantifier  called  the  Ox-Cart  dimension 
is  defined  which  characterizes  the  adequacies  of  ANN  solutions  to  the  classification 
problem.  The  Ox-Cart  dimension  is  based  on  a  geometric  characterization  of  clas¬ 
sification  problems,  called  the  geometric  complexity.  Both  the  Ox-Cart  dimension 
and  the  geometric  complexity  exemplify  structures  in  the  generalized  framework  and 
exhibit  desired  invariant  properties.  The  Ox-Cart  dimension  will  be  used  to  define  a 
partial  ordering  which  will  produce  the  lattice  of  feed-forward,  single  hidden-layer, 
perceptron  ANNs.  The  lattice  structure  facilitates  comparisons  of  ANN  architectures 
based  on  their  ability  to  solve  classification  problems. 

The  novelty  of  this  approach  to  quantify  ANNs’  capabilities  is  that  it  is  de¬ 
signed  to  be  consistent  with  properties  specific  for  analyzing  neural  network  archi¬ 
tectures  and  problem  specific  characteristics.  This  is  different  from  current  methods 
based  on  the  Vapnik-Chervonenkis  (V-C)  dimension  which  analyzes  an  ANN’s  abil¬ 
ity  to  solve  arbitrary  classification  problems.  However,  in  some  aspects  the  Ox-Cart 
dimension  parallels  V-C  dimension.  Hence,  the  repeated  use  of  the  term  dimension. 
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(It  should  be  noted  that  the  term  Ox-Cart  dimension  was  created  solely  to  parallel 
V-C  dimension.  Neither  quantifiers  are  dimensions  in  the  traditional  mathematical 
sense,  the  cardinality  of  a  basis  set.) 

Considerable  background  research  on  the  V-C  dimension  approach  led  to  sig¬ 
nificant  results  in  two  general  topics.  The  first  result  extended  existing  bounds  and 
precise  values  for  V-C  dimension  and  related  mappings.  This  was  accomplished  by 
analyzing  and  exploiting  the  algebraic  and  topological  structures  of  the  mappings. 
The  second  result  found  that  there  is  a  strong  connection  between  the  study  of 
ANNs’  capability  and  an  area  of  mathematics  that  will  be  referred  to  as  combi¬ 
natorial  geometry.  In  fact,  it  will  be  shown  that  the  V-C  dimension  of  an  ANN 
can  be  evaluated  using  combinatorial  geometric  methods  for  counting  chambers  of  a 
hyperplane  arrangement. 

Combinatorial  geometry  is  a  rich  area  of  mathematics  that  rehes  heavily  on 
a  lattice  structure.  When  that  structure  is  in  place,  much  can  be  said  about  the 
capacity  of  a  system;  specific  to  this  research  are  the  systems  generated  by  ANNs. 
In  this  document,  the  groundwork  will  be  laid  to  make  combinatorial  geometry 
available  to  the  artificial  neural  network  community  of  researchers.  Specifically,  the 
cut-intersection  semi-lattice  of  an  ANN  will  be  defined  and  used  to  obtain  invariants 
for  capability  analysis.  Additionally  a  lattice  of  the  chamber  sets  of  ANNs  will  be 
established. 

This  research  rehed  heavily  on  previous  work  in  the  area  of  measuring  ANN 
capabilities.  For  example,  extending  the  results  of  the  significant  1957  work  of  Andrei 
Kolmogorov,  G.  Cybenko  showed  that  the  set  of  functions  that  can  be  approximated 
by  a  multi-layer  perceptron  network  are  dense  in  the  set  of  continuous  fimctions 
defined  on  the  n-dimensional  cube  (18).  This  result  was  extended  further  by  A. 
Ronald  Gallant  and  Halbert  White  who  showed  that  ANNs  are  dense  in  Sobolev 
Spaces,  suggesting  that  derivative  information  can  be  approximated  (19). 
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What  is  not  so  clear  about  the  capabilities  of  an  ANN  is  what  it  takes  to 
learn  these  functions.  What  size  ANN  is  required?  How  many  training  samples  are 
needed?  Recent  publications  on  this  subject  have  borrowed  from  important  works 
by  Cover  and  Vapnik-Chervonenkis.  Baum  used  theorems  of  Cover  to  give  bounds 
on  the  required  ANN  size  for  certain  problems.  More  recently,  Eduardo  Sontag  pro¬ 
posed  mappings  similar  to  V-C  dimension  for  the  specific  purpose  of  analyzing  ANN 
capabilities.  Sontag  derived  interesting  results  about  sigmoidal  transfer  functions 
and  direct  connections  using  his  mappings  (39).  This  dissertation  extends  these 
ideas  even  further. 

The  following  sections  will  explain  the  details  of  classification  problems,  ar¬ 
tificial  neural  networks,  and  what  is  meant  by  the  ability  to  generalize.  With  this 
explanation,  a  common  set  of  terminology  will  be  established. 

1.1  Problem:  Generalization  of  Classification  Data 

In  general,  there  are  two  distinct  problems  solved  with  artificial  neural  net¬ 
works:  function  approximation  (interpolation)  and  data  classification.  One  can  think 
of  both  of  these  problems  as  function  approximation,  where  interpolation  approxi¬ 
mates  functions  of  the  form 


/  :  di,  ^2  e  N 

and  two- class  classification  problems  approximate  functions  of  the  form 

/:R''^{0,1}  deN. 

The  distinction  is  made  since  analysis  of  complexity  of  the  problems  is  different. 
(Note  that  throughout  this  document  R*^  refers  to  d-dimensional  Euclidean  space.) 
The  research  outlined  by  this  document  deals  with  the  problem  of  classification  of 
data.  The  research  will  concentrate  on  the  two-class  problem. 
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The  two-class  problem  (sometimes  called  the  two-coloring  problem  which  is 
a  special  case  of  the  n-coloring  problem)  refers  to  the  problem  of  separating  a  set 
of  points,  X  C  into  two  pre-defined  subsets,  X'^,  X~  where  fl  X~  =  0, 
and  U  X~  =  X.  The  two  subsets,  and  X~,  together  define  a  dichotomy  or 
partition  on  X.  It  is  said  that  the  solution  to  a  classification  problem  is  an  function 
/  which  implements  the  dichotomy,  that  is  f{x)  >  0  for  all  x  €  X'^  and  f{x)  <0  for 
all  X  €  X~  for  a  given  set  X.  The  ordered  pair  (X+,  X~)  is  referred  to  as  a  signed 
set.  Referring  to  X~)  as  an  ordered  pair  is  meaningful  and  accurate  since  (-X’+, 
X“)  /  The  space  of  signed  sets  is  investigated  in  detail  as  part  of  the 

new  research  presented  in  this  dissertation. 

1.2  Tools:  Feed-Forward,  Single  Hidden-Layer,  Perception  Artificial  Neural  Net¬ 
works 

The  research  in  this  dissertation  concerns  feed-forward,  single  hidden-layer, 
perception  artificial  neural  networks.  See  Figure  1  for  a  graphical  depiction.  Feed¬ 
forward  refers  to  the  upward  direction  of  the  graphical  depiction  in  which  the  input 
vectors  are  processed.  Note  that  the  picture  shows  three  layers  of  processing  units 
referred  to  as  nodes,  units,  or  processors.  The  bottom  layer  is  the  input  layer  of 
nodes.  This  is  the  starting  point  for  the  input  data.  The  middle  layer,  is  referred 
to  as  the  hidden-layer.  The  top  layer  is  referred  to  as  the  output  layer.  Each  layer 
is  connected  by  interconnections.  Each  interconnection  has  an  associated  weight. 
Each  node  performs  a  process  on  incoming  vectors  or  scalars  and  outputs  vectors 
or  scalars.  The  process  may  differ  depending  on  the  layer.  In  a  single  hidden-layer 
perceptron  artificial  neural  network,  the  middle  layer  nodes  are  perception  nodes. 
In  the  case  of  a  perceptron  node,  the  input  is  usually  the  dot  product  of  the  input 
vector  and  the  node’s  incoming  weight  vector.  This  dot  product  is  compared  to 
the  node’s  threshold  value,  and  based  on  the  comparison  a  scalar  is  output.  The 
comparison  is  performed  by  a  sigmoid  function  sometimes  referred  to  as  a  threshold 


4 


x-t.) 


x-t,) 


X  in  Rd 


Figure  1.  A  feed- forward,  single  hidden-layer,  perceptron  ANN. 

function  or  transfer  function.  In  addition,  a  direct  connection  may  be  included.  The 
direct  connection  processes  the  input  vector  straight  to  the  output  layer  bypassing 
the  middle  layer  and  has  no  associated  sigmoid  function. 

Now  consider  the  mathematics  of  the  ANN. 

Definition.  A  function,  0  :  R  — >  R,  will  be  called  a  sigmoid  if 


t+  :=  lim  0(s)  and  <_  :=  lim  0(s). 

^  S-f+OO  '  '  s-^-oo  ’ 


The  following  are  two  examples  of  commonly  used  sigmoid  functions.  The  first  is 
the  standard  sigmoid  function 


1 

1  -t-e-^‘ 


(1) 


5 


The  second  is  referred  to  as  the  hard-limiter  or  Heaviside  function 


H{s) 


0  if  s  <  0 
1  if  s  >  0. 


(2) 


The  input  into  each  perceptron  node  is  the  dot  product  of  the  input  vector, 
X  €  and  the  node’s  associated  incoming  interconnection  weight  vector,  Vi  € 
Each  node  has  an  associated  scalar  threshold  value,  €  R.  The  threshold  value  is 
subtracted  from  the  dot  product  and  the  result  is  passed  through  a  sigmoid.  The 
output  of  each  perceptron  can  be  defined  as  6(vi-x  —  Ti).  Note  that  if  f{x)  =  Vi-x  —  Ti, 
then  f(x)  =  0  is  the  equation  of  a  hyperplane  in  R®^.  The  function  /  is  also  referred 
to  as  a  separating  surface. 


Each  of  the  k  perceptron  output  values,  which  are  scalar,  are  multiplied  by 
the  perceptron’s  associated  outgoing  interconnection  weight,  u;,-,  which  is  a  scalar. 
These  values  are  the  input  to  the  output  layer  where  they  are  summed  together  to 
produce 

k 

^UJi0{vi  -X-Ti). 


»=1 

Define  the  sigmoid  function  H  to  be 


nis)  =  H{s)  -  H{-s) 


f 

1  if  s  >  0 
^0  if  s  =  0 
—1  if  s  <  0. 


Note  that  if6  =  li,  then  the  value  (u,-  -x  —  Ti)  can  be  interpreted  as  an  indication  of 
the  input  vector,  x,  being  on  one  side  of  the  hyperplane,  {x  €  R*^  :  f{x)  =  Vi-x  —  Ti  = 
0},  or  the  other.  Equivalently,  it  indicates  if  x  is  in  the  positive  halfspace,  A+,  where 


=  {x  :  /(x)  >  0} 
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or  the  negative  halfspace,  h  ,  where 


h  =  {x  :  f{x)  <  0}. 

Moreover,  if  the  outgoing  weights  are  binary,  i.e.  Ui  G  {0, 1},  then  the  value 

k 

■  X  -  Ti) 

can  distinguish  between  intersections  of  the  halfspaces.  Moreover,  it  completely 
defines  the  function  F  of  an  ANN  for  any  xG  R^.  Specifically,  define  F  as 

k 

F(x)  =  ^  (3) 

t  =  l 

where  a;,-  6  {0, 1},  0  =  H,  Vi  G  and  r,  G  R  for  i  =  1, . . . ,  fc.  This  is  the  equation 
of  the  ANN  which  is  the  object  of  capability  analysis  of  the  new  research  in  this 
dissertation. 

Equation  3  is  not  the  most  general  definition  of  a  feed-forward,  single  hidden- 
layer,  perceptron  artificial  neural  network.  The  most  general  case  requires  the  per- 
ceptron’s  outgoing  interconnection  weights  to  be  any  real  value.  AdditionaUy,  a 
direct  connection  is  included  that  is  defined  by 


Wo  +  1^0  •  X, 


where  cjq  €  R  and  Vq  G  R'^.  Combining  yields  the  following  function,  G,  that  de¬ 
fines  a  general  feed-forward,  single  hidden-layer,  perceptron  artificial  neural  network. 
Specifically, 

k 


G{x)  =  Wo  +  uo  •  a:  +  '^ix>i0{vi  •  x  -  r,), 

t=i 


(4) 


where  0  is  any  sigmoid  function,  u,-  G  R*^,  w,-,  r,-  G  R,  for  i  =  0, 1, . . . ,  fc.  Note  that 
F  =  G  if  Wi  G  {0, 1},  0  =  W,  Wo  =  0,  and  vq  =  0. 
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The  parameters  a;,,  and  r^,  for  i  =  0, 1, . . . ,  are  approximated  to  yield  an 
approximate  function.  The  approximated  function  will  be  referred  to  as  a  solution 
or  an  ANN  solution  to  the  classification  problem  described  by  a  signed  set.  The 
approximation  process  is  referred  to  as  learning  or  training  and  is  accomplished  with 
the  use  of  a  training  set  of  data  which  is  a  signed  set.  All  possible  such  G's  will  be 
referred  to  as  a  family. 

1.3  Analysis:  Generalization 

In  this  research,  there  are  basically  two  questions  being  asked  about  an  artificial 
neural  network’s  ability  to  generalize  about  classification  data.  One  is;  What  size  net 
is  required  to  accomplish  a  given  classification  problem?  That  is,  What  is  the  value 
of  k?  The  second  question  is:  How  large  of  a  set  of  vectors  can  a  given  architecture 
dichotomize  ?.  This  division  of  perspectives  is  evidenced  in  the  literature  described  in 
Chapter  2.  Both  of  these  questions  and  their  relationship  wiU  be  partially  answered 
in  this  research. 

It  is  important  to  clarify  the  term  generalization.  In  this  document,  for  an 
ANN  to  generalize  a  signed  set,  (X+, X”),  means  that  after  training  the  ANN  has 
produced  a  function  G  that  implements  the  dichotomies  of  the  training  set  (which 
is  a  signed  set).  Hence,  the  analysis  of  the  capabilities  of  a  family  of  functions,  G, 
would  be  an  investigation  of  an  arbitrary  G’s  ability  to  implement  all  dichotomies  in 
the  training  set.  This  is  also  referred  to  as  the  family’s  separating  capacity.  It  should 
be  noted  that  the  term  generalization  is  sometimes  used  differently  in  the  literature. 
The  definition  presented  above  could  be  referred  to  as  memorization,  in  which  case 
the  analysis  of  the  performance  of  G  to  generalize  would  be  an  investigation  of  G’s 
performance  on  data  not  used  for  training. 
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1.4  Synthesis:  Achieving  Generalization 

Achieving  generalization,  i.e.,  approximating  the  parameters  n,-,  a;,-,  and  Tj,  for 
i  =  0, 1, . . . ,  A;  is  performed  by  the  chosen  learning  algorithm.  The  performance  of 
learning  algorithms  is  not  addressed  by  this  research.  Rather,  there  is  an  assumption 
that  if  a  solution  (best  choice  of  parameters)  exists,  then  it  can  be  learned.  This 
assumption  is  supported  by  Kolmogorov,  Hecht-Nielsen  and  Cybenko  (24)  (18). 

1.5  Validation:  Testing  the  Results 

Validation  is  the  portion  of  the  problem  solving  process  that  deals  with  testing 
the  results  and  determining  its  validity.  Validation  is  not  addressed  directly  since 
the  scope  of  this  research  does  not  involve  a  particular  application  and  its  solution. 
This  should  not  be  confused  with  validating  the  adequacy  of  the  proposed  ANN 
capability  quantifiers  which  will  be  addressed. 

1.6  Conclusions 

In  summary,  this  chapter  provided  a  basis  of  general  knowledge  about  solving 
classification  (two- class)  problems  using  ANNs.  It  also  explained  what  portions  of 
the  solution  process  are  directly  addressed  as  the  new  research  in  this  dissertation. 

The  second  chapter  details  relevant  literature  to  motivate  technically  the  new 
research  presented  in  the  following  chapters.  Chapter  III  provides  a  re-evaluation 
of  existing  results  which  motivated  extensions  of  that  work.  The  fourth  chapter 
discusses  desired  properties  of  capability  quantifiers  lacking  in  V-C  based  quantifiers. 
Chapter  V  will  introduce  basics  of  lattice  theory  and  combinatorial  geometry,  and 
recast  the  mathematics  of  ANNs  in  the  context  of  lattice  theory.  Chapter  VI  will 
present  a  generalized  framework  in  which  ANN  capability  analysis  can  be  performed. 
Chapter  VII  will  bring  together  the  innovations  of  the  generalized  framework  of 
Chapter  VI  and  the  rich  mathematics  of  Chapter  V  to  define  the  Ox-Cart  dimension 
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and  the  lattice  of  ANNs.  Ideas  for  follow-on  research  will  be  enumerated  in  Chapter 
VIII.  A  summary  is  provided  in  Chapter  IX. 
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II.  Background 

This  chapter  reviews  research  on  the  problem  of  determining  capabilities  of  cer¬ 
tain  artificial  neural  networks  (ANNs).  In  particular,  the  architectural  requirements 
for  an  ANN  to  have  the  capability  to  approximate  arbitrary  functions  is  addressed. 
Included  are  methods  to  quantify  the  capability  of  a  class  of  separating  sets  and 
explanations  on  how  these  quantifiers  direct  the  approach  to  solving  classification 
problems.  The  material  presented  in  this  chapter  is  included  to  motivate  the  research 
presented  in  this  dissertation.  In  particular,  the  material  outlines  a  technical  pro¬ 
gression  of  research  which  serves  as  a  catalyst  for  the  results  given  in  the  following 
chapters. 

This  chapter  discusses  published  research  on  the  Vapnik-Chervonenkis  (V-C) 
dimension  of  a  set  of  separating  surfaces.  This  quantifier  or  mapping  (sometimes 
inaccurately  referred  to  as  a  measure  in  the  literature),  also  known  as  the  separating 
capacity  of  a  class  of  sets,  directs  the  determination  of  the  size  of  the  ANN  required  to 
solve  a  problem  (9).  Valiant’s  work  emphasizes  the  importance  of  an  accurate  ANN 
capability  quantifier  by  showing  that  the  V-C  dimension  also  directs  the  amount  of 
training  data  required  for  adequate  learning  (45).  Learning  is  not  discussed  in  detail 
in  this  document.  Hence,  Valiant’s  results  are  not  presented  in  detail.  However, 
Valiant’s  results  are  mentioned  as  testimony  for  the  usefulness  of  research  related 
to  refining  the  definition  of  V-C  dimension  and  providing  bounds  or  values  for  the 
quantifier,  given  particular  ANN  architectures. 

The  technical  review  provided  here,  includes  Cover’s  work  about  a  single  sep¬ 
arating  surface  which  could  be  a  hyperplane  defined  by  a  perceptron  or  a  nonhnear 
surface.  Also  included  are  Baum’s  results  about  multiple  separating  hyperplanes  de¬ 
fined  by  multiple  perceptmns  in  a  feed-forward  single  hidden-layer  perceptron  ANN. 
Additionally,  Sontag’s  results  about  alternative  V-C  based  quantifiers  for  ANNs  are 
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presented.  Before  the  results  are  discussed,  two  concise  definitions  of  V-C  dimension 
and  how  V-C  dimension  relates  to  ANNs  are  provided. 

2.1  Vapnik-Chervonenkis  Dimension 

Vapnik-Chervonenkis  (V-C)  dimension  is  a  mapping  that  describes  the  sepa¬ 
rating  capacity  of  a  set  of  sets.  Put  another  way,  V-C  dimension  is  a  quantifier  of 
how  well  a  set  of  sets  can  implement  arbitrary  dichotomies  of  a  signed  set.  The 
set  of  sets  which  are  the  domain  of  the  V-C  dimension  can  simply  be  an  arbitrary 
set  of  sets,  or  be  derived  from  a  set  of  functions.  Since  the  set  of  sets  can  be  de¬ 
rived  from  functions,  V-C  dimension  is  a  useful  quantifier  of  classification  capability 
of  ANNs.  In  particular,  the  set  of  sets  generated  by  ANNs  are  the  halfspaces  and 
their  intersections  generated  from  the  separating  surfaces,  the  hyperplanes  defined 
by  perceptrons. 

The  following  are  two  exphcit  definitions  of  V-C  dimension.  The  first  is  a  gen¬ 
eral  definition  defined  on  a  set  of  sets  and  is  easy  to  work  with  mathematically.  The 
second  is  more  specific  since  it  is  defined  on  a  set  of  sets  generated  from  separating 
surfaces. 

Consider  the  following  general  definition  of  V-C  dimension  outlined  by  Wenocur 
and  Dudley  in  (47).  This  definition  is  not  as  intuitive  as  the  second,  but  will  provide 
the  required  flexibihty  for  the  new  research  in  this  dissertation.  Given  a  nonempty 
set,  X  C  a  collection,  fl,  of  subsets  of  X,  and  a  finite  set  Y  C  X,  let  A^(Y) 
denote  the  number  of  distinct  sets  O  CiY  for  0  €  fl.  That  is 

A“(y)  =  card({o  n  y :  0  €  D})  <  2”. 

Define 

m“(n)  =  max{A^(F)  :YcX,  card(F)  =  n}  <  2^. 
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Notice  that  m^(n)  is  no  larger  than  all  possible  dichotomies  of  F,  that  is  m^(n)  <  2" 
for  all  n.  Define  VC{il)  as  the  V-C  dimension  of  the  set  of  sets  fl  given  by 

VC{Q,)  =  inf{n  :  m^(n)  <  2”}. 


with 

V’C'(fl)  =  +00  if  mP{n)  =  2”  for  all  n. 

Now,  we  can  make  a  second  definition  which  is  an  instantiation  of  the  first. 
Consider  the  following  preliminary  definitions. 

Definition.  A  dichotomy  of  a  set,  X  C  is  the  partition  of  its  elements  into  two 
disjoint  subsets  X'^  and  X~  such  that  X"^  U  X~  =  X  and  A'*'  Pi  X~  =  0  and  is 
denoted  by  the  ordered  pair  (A'’",A~).  (A"’',A“)  is  referred  to  as  a  signed  set.  If 
there  are  n  elements  in  X,  then  there  are  2”  possible  dichotomies  of  A. 

Definition.  A  separating  surface,  /,  on  a  set  A  C  R*^  is  a  function  that  maps  A  to 

R. 

Definition.  A  dichotomy,  (A"^,  A~),  of  a  set  A  C  R*^  is  implemented  by  a  set,  T, 
of  separating  surfaces  if  there  exists  f  €  such  that  f{x)>0  for  all  x  €  A"*"  and 
f{x)  <  0  for  all  X  €  A". 

Definition.  A  set  of  separating  surfaces,  .F,  shatters  the  set  A  if  every  possible 
dichotomy  of  A  can  be  implemented  by  a  separating  surface  f  ^  if. 

Definition.  The  set  of  vectors  A  C  R*^  are  said  to  be  in  general  position  if  every 
subset  Y  C  A  of  d  or  fewer  vectors  is  a  linearly  independent  set. 

With  the  above  definitions,  V-C  dimension  can  be  defined  as  follows. 

Definition.  The  Vapnik-Chervonenkis  (V-C)  Dimension  of  a  set  of  surfaces,  J- ,  is 
the  largest  integer  n,  such  that  there  is  at  least  one  set  in  general  position,  A  C  R'* 
of  cardinality  n  that  can  be  shattered  by  T. 
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Ill  other  words,  V-C  dimension  of  a  set  of  functions  representable  by  ANNs  is  the 
largest  set  of  data  which  can  be  guaranteed  to  be  implemented  regardless  of  the 
classification  of  the  points  in  the  data  set. 

It  is  meaningful  here  to  reiterate  that  when  the  set  of  surfaces,  referred  to 
in  the  above  definitions,  is  the  set  of  hyperplanes  defined  by  the  perceptron  nodes 
in  an  ANN’s  hidden  layer,  then  the  V-C  dimension  of  .F,  is  a  quantifier  about  the 
capability  of  that  architecture  to  implement  arbitrary  dichotomies  of  a  set  of  vectors. 
The  value  of  the  V-C  dimension  indicates  the  cardinality  of  a  set  of  training  vectors 
which  can  be  guaranteed  to  be  classified  correctly  by  an  ANN  regardless  of  the 
dichotomy. 

As  an  example  in  R^,  Choose  II  to  be  the  set  of  halfspaces  generated  from 
the  hyperplane  associated  with  a  single  perceptron  node.  The  V-C  dimension  of  the 
set,  comprised  of  the  single  line  in  two  dimensions,  or  equivalently  the  set,  fi, 
comprised  of  the  two  halfspaces  generated  by  the  line,  is  three.  This  is  because  there 
is  a  set  of  three  points  in  general  position  which  can  be  implemented  and  there  is  no 
set  of  four  points  in  general  position  which  can  be  shattered.  Consequently,  it  can  be 
said  that  the  V-C  dimension  of  an  ANN  with  a  single  perceptron  with  a  hard-hmiter 
sigmoid,  which  takes  as  its  input  vectors  in  is  three. 

2.2  Cover:  The  Separating  Capacity  of  a  Surface 

In  this  section,  it  wiU  be  established  that  the  separating  capacity,  the  number 
of  vectors  that  can  be  separated,  of  a  certain  set  of  nonhnear  separating  surfaces 
having  d  degrees  of  freedom  is  2d  vectors  (16).  The  separating  capacity  is  another 
term  used  to  express  the  ability  of  separating  surfaces  to  dichotomize  signed  sets. 
Hence,  Cover  (16)  equates  separating  capacity  with  the  V-C  dimension.  That  is,  the 
V-C  dimension  of  a  certain  set  of  nonlinear  separating  surfaces  with  d  parameters  is 
2d.  The  surfaces  axe  derived  from  homogeneous  functions,  /u,  :  R'*  — >  {—1, 0, 1}.  The 
surface  is  defined  as  {x  :  /u,(x)  =  0},  where  w  is  an  arbitrary  vector  of  parameters 
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called  the  weight  vector.  The  separating  capacity  of  a  set  of  surfaces  as  defined 
by  Cover  is  the  number  of  vectors  in  a  set  whose  dichotomies  can  be  implemented 
with  a  probabihty  of  1/2.  Probability  concepts  are  introduced  because  the  number 
of  random  dichotomies  of  n  points  in  d  dimensions  which  can  be  implemented  with 
certainty  is  shown  to  have  a  cumulative  binomial  distribution. (16) 

The  vectors  of  X  are  assumed  to  be  randomly  distributed  in  a  finite  dimen¬ 
sional  space.  The  vectors  are  assumed  to  be  in  general  position.  A  dichotomy  of  X, 
(X'^,X~),  is  said  to  be  separable  relative  to  a  set  of  surfaces,  T,  if  there  exists  a 

surface,  f  ^  T ,  that  separates  the  points  in  X'^  from  those  in  X~\  i.e.  there  exists 

fw  ^  ^  such  that 

1  if  iu-a;>0  VxG 

/u;(a:)  =  <  0  if  tu  •  X  =  0  (5) 

—1  if  w  ■  X  <  0  VxGX~. 

k. 

2.2.1  The  Function  Counting  Theorem.  The  Function  Counting  Theorem 
answers  the  question:  How  many  homogeneously,  linearly  separable  dichotomies  of  n 
points  in  d-dimensional  space  are  there?  (A  set  is  said  to  be  homogeneously,  linearly 
separable  if  there  exists  an  /„,  that  satisfies  Equation  5  and  /«,(0)  =  0.)  Consider 
the  following  well-established  theorem  given  here  without  proof. 

Theorem  1  (The  Function  Counting  Theorem)  (16:326)  There  are  C{n,d) 
homogeneously,  linearly  separable  dichotomies  of  n  points  in  general  position  in  d- 
dimensional  space  where 

CK  <i)  =  2  E  (";*). 

2.2.2  Generalizing  to  Arbitrary  Surfaces.  Now,  consider  similar  arguments 
for  arbitrary  separating  surfaces.  Let  he  a,  set  of  arbitrary  separating  surfaces. 
Let  (X"'', X")  be  a  signed  set  of  n  points.  Let  be  a  vector- valued  mapping 


15 


9 


where  (j){x)  =  (^i{x),(f>2(x),. . .  ,4>d{x))  and  x  G  R*^-  Assume  that  (j>  is  dimension 
preserving. 

Definition.  The  set  {x  :  w  •  <f>{x)  —  0}  is  referred  to  as  a  <f>-surface. 

Definition.  A  dichotomy  of  X,  {X+,X~)  is  <f)-separable,  if  there  exists  G  R'^  such 
that 

w  ■  4>{x)  >  0  V  X  G  X"^ 
w  •  <f){x)  <  0  V  X  €  X~. 

Definition.  If  (f>  is  as  defined  above  and  for  each  X  =  {xi,X2, a;„}  C  R'^,  then 
X  is  said  to  be  in  4>-general  position  if  every  k-e\ement  subset  of  the  set  4>[X]  = 
{<^(xi),  ^(x2), . . . ,  <f>{xn)},  is  linearly  independent  for  all  A:  <  d  and  X  C  R*^. 

The  following  lerruna  enables  the  Function  Counting  Theorem  to  extend  to 
arbitrary  surfaces,  and  also  to  surfaces  constrained  to  pass  through  a  set  of  points. 

Lemma  1  Let  (X"^,  ^~)  ®  signed  set  of  points  in  R*^  and  let  y  ^  X  be  a 

point  other  than  the  origin  in  R*^.  Then,  the  dichotomies  U  {j/},  X~)  and 
(X"*", X~U{t/})  are  both  homogeneously  linearly  separable  if  and  only  if  (X'^,X~) 
are  homogeneously  linearly  separable  by  a  [d  —  dimensional  subspace  containing 
y.  (16:327) 

Geometrically  speaking,  Lemma  1  provides  that  a  new  point  can  be  adjoined  to 
either  half  of  a  linearly  separable  dichotomy  to  form  two  new  separable  dichotomies 
if  and  only  if  there  exists  a  separating  hyperplane  containing  the  new  point  that 
separates  the  original  dichotomy. 

Sufficient  background  has  been  established  to  present  Cover’s  major  result. 

Theorem  2  If  a  <j)-surface,  {x  G  X  :  w-<f)(x)  =  0},  is  constrained  to  contain  the  set 
of  points  Y  =  {yi,  t/2,  •  •  • ,  Vk},  where  {(i>{yi),  <f>{y2),  ■  ■  <t>{yk)}  is  linearly  independent 
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and  where  the  projection  o/^(xi),  <f>{x2),  .  ■  <f>{xn)  onto  the  orthogonal  subspace  to 

the  space  spanned  by  <f>{y2),  •  •  •;  4>{yk)}  is  in  general  position,  then  there  are 

C{n,d—  k)  <j)-separable  dichotomies  of  X.  (16:327). 

2.2.3  Separability  of  Random  Patterns.  There  are  two  different  notions  of 
randomness  associated  with  the  classification  problem. 

•  The  set  X  is  fixed  in  position,  but  the  vectors  in  X  are  classified  independently 
with  equal  probability  into  one  of  two  classes. 

•  The  set  X,  itself,  is  randomly  distributed  in  space  and  the  desired  dichotomiza- 
tion  may  be  random  or  fixed. 

Either  way,  the  separability  of  the  signed  set  becomes  a  random  event  indepen¬ 
dent  of  the  dichotomy  and  the  geometric  configuration.  This  leads  to  two  questions: 
What  is  the  probability  of  being  able  to  implement  an  arbitrary  dichotomy?  and  What 
is  the  maximum  number  of  points  which  can  be  separated  by  a  family  of  <f>-surfaces? 

Suppose  that  X  =  {ari ,  X2,  •  •  • ,  is  fixed  and  a  dichotomy  is  chosen  at  random 
with  equal  probability  from  the  2"  possible  dichotomies  of-X.  Let  X  be  in  (j)- 
general  position  with  probability  1,  and  let  P{n,d)  denote  the  probability  that  a 
random  dichotomy  is  (;6-separable.  Again,  d  denotes  the  degrees  of  freedom  of  (j)  or 
equivalently  the  dimension  of  the  image  space  under  <j).  Then,  from  Theorem  2,  there 
are  C{n,  d)  ^-separable  dichotomies  of  the  2"  total  number  of  dichotomies.  Hence, 

=  ^C(n,  rf)  =  jij  g  *  j .  (6) 

Equation  (6)  is  the  cumulative  binomial  distribution  corresponding  to  d  -  1  or  fewer 
successes  of  n  —  1  trials  where  the  event  space  is  binary.  This  answers  the  first 
question. 

The  utility  of  Cover’s  material  actually  lies  in  the  second  question  which  is 
answered  by  Cover’s  following  results.  What  is  the  largest  n  such  that  P{n,d)  =  1? 
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(This  is,  in  fact,  the  question  also  answered  by  the  works  of  Baum  and  Sontag  which 
are  explained  in  later  sections  of  this  chapter.)  In  other  words,  what  is  the  largest 
cardinality  of  a  set  such  that  any  randomly  selected  dichotomy  can  be  implemented 
with  probabihty  1  by  a  ^-surface  with  d  degrees  of  freedom. 

Let  {xi,X2, . . .}  be  a  set  of  random  vectors  in  general  position  and  define  the 
random  variable,  N,  to  be  the  largest  integer  such  that  {xi,X2, . . . ,  a; at}  is  ^-separable 
where  </>  has  d  degrees  of  freedom.  Then,  since  Equation  (6)  is  the  cumulative 
binomial  distribution,  the  probability  that  TV  =  n  is  the  difference  in  the  probabilities 
that  TV  >  n  and  TV  >  n  +  1,  or 

P,{TV  =  n}  =  P(n,d)-P(n  +  l,d) 

=  n  =  l,2,... 

This  is  commonly  known  as  the  negative  binomial  distribution  with  probability  of 
failure  equal  to  In  this  scenario,  n  is  the  number  of  trials  required  before  d  failures 
of  a  binary  experiment  are  expected  to  be  generated,  and 

E{n)  =  2d 
Median{n)  =  2d. 

Here  E{-)  is  the  expected  value  operator.  If  d  is  chosen  carefully  with  respect  to 
n,  then  the  asymptotic  probabihty  that  n  vectors  are  separable  by  a  (^-surface  of  d 
degrees  of  freedom  appears  like  the  cumulative  normal  distribution.  Specifically,  let 
a  be  any  real  number  then  choose  d  approximately  as  follows:  d  (2)  +  (2) 
Then, 


as  n  00,  where  #(a)  is  the  cumulative  normal  distribution 

$(a)  =  r 

^  ^  x/27rV-oo 
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In  other  words,  ^  approaches  1  asymptotically  as  n  grows  large. 

The  most  interesting  observation  about  P{-,  •)  is  that  for  0  <  e  <  1  and  for  all 
d€N, 

lim,_P(2d(l  +  e),d)  =  0 

P{2d,d)  =  I 

lim^__^  P(2d(l  -  e),d)  =  1. 

This  was  shown  by  Winder  (49).  The  threshold  effect,  where  the  number  of  vectors 
equals  twice  the  number  of  degrees  of  freedom  of  the  separating  surface,  suggests  that 
2d  is  the  separating  capacity  oi  a  set  of  separating  surfaces  having  d  degrees  of  freedom 
(16:331).  Bringing  this  section  together  is  the  fact  that  this  separating  capacity,  the 
maximum  cardinality  of  a  set  whose  random  dichotomies  can  be  implemented  with 
probabihty  1  by  a  set  of  separating  surfaces  having  d  degrees  of  freedom,  is  the  V-C 
dimension  of  the  set  of  separating  surfaces. 

An  important  caveat  to  remember  about  Cover’s  work  is  that  it  is  about  the 
abilities  of  one  separating  surface  from  a  particular  family  to  separate  a  dichotomy. 
The  next  section  wiU  outline  research  that  extended  Cover’s  work  to  multiple  sepa¬ 
rating  surfaces  for  the  purpose  of  quantifying  artificial  neural  network  capabilities. 

2.3  Baum:  Artificial  Neural  Network  Architecture  Size  for  Arbitrary  Classification 

In  the  last  section,  the  separating  ability  of  one  surface  was  established.  In  this 
section,  the  separating  ability  of  multiple  surfaces  will  be  investigated.  In  particular, 
the  results  of  Baum  (9)  show  that  the  V-C  dimension  of  a  feed-forward,  single  hidden- 
layer,  perceptron  ANN  is  finite.  Moreover, 

2  d  <  VC*  <  2N,, log^ieNN), 
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where 

Ni  =  number  of  hidden-layer  nodes 

d  =  dimension  of  input  space 

Nw  =  total  number  of  weights  in  the  ANN 

Nfi  =  total  number  of  nodes  in  the  ANN 

e  =  Euler’s  number 

VC  =  V-C  dimension  of  the  ANN. 

Note  that  the  lower  bound  is  approximately  equal  to  the  number  of  weights  con¬ 
necting  the  input  layer  to  the  hidden  layer  of  the  ANN.  The  upper  bound  is  not 
much  more  than  twice  the  number  of  weights  in  the  ANN.  Hence,  a  rough  estimate 
of  the  V-C  dimension  of  the  ANN  could  be  the  total  number  of  weights  in  the  ANN. 
Vapnik  and  Chervonenkis’  results  suggest  that  the  number  of  vectors  in  the  training 
set  required  for  learning  is  at  least  the  value  of  the  V-C  dimension  (46).  The  back¬ 
ground  of  this  result  is  what  is  pertinent  to  the  research  proposed  in  this  document. 
The  extension  of  Cover’s  work  to  feed- forward,  single  hidden-layer,  perceptron  ANNs 
provides  an  interesting  perspective  on  the  capabilities  of  ANNs  (9). 

In  the  following  lemma  and  theorem  provided  by  Baum,  the  ANN  is  a  feed¬ 
forward,  single  hidden-layer,  perceptron  ANN  where  the  perceptrons  have  hard- 
limiters  as  the  sigmoid  function.  The  set  to  be  shattered,  V,  has  n  points  which 
are  in  general  position  in  Euclidean  d-space.  The  lemma  establishes,  by  counter¬ 
example,  that  there  must  be  n/d  perceptron  units  in  order  to  shatter  the  n  points. 
The  theorem  provides  the  sufficiency  condition  for  an  ANN  to  be  able  to  shatter  a 
set  of  n  points. 

Lemma  2  Any  net  capable  of  arbitrary  dichotomies  of  n  points  in  d  dimensions 
must  have  at  least  n/d  nodes  in  its  hidden-layer.  (9:198) 

The  next  theorem  proves  that  a  single  hidden-layer  ANN  can  in  fact  achieve 
arbitrary  dichotomies  with  hidden-layer  nodes. 
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Theorem  3  A  feed-forward,  single  hidden-layer,  perceptron  ANN  with  |  ^  |  hidden- 
layer  units  can  compute  an  arbitrary  dichotomy  on  n  d-dimensional  vectors  in  general 
position.  (9:199) 

Through  counting  arguments,  Baum  also  establishes  lower  bounds  on  the  num¬ 
ber  of  hidden-layer  nodes  and  the  number  of  connection  weights.  In  other  words, 
the  lower  bounds  are  bounds  under  which  there  is  no  guarantee  that  an  arbitrary 
dichotomy  can  be  separated.  Additionally,  Baum  consistently  uses  the  notion  of  a 
worst-case  geometric  arrangement  of  a  signed  set  to  prove  his  results.  The  worst-case 
arrangement  can  be  described  as  an  n-gon  in  R^,  where  the  vertices  are  the  points 
in  the  signed  set  and  have  an  alternating  sign.  (9:200-204) 

In  summaxy,  Baum  has  specified  the  domain  of  V-C  dimension  to  a  set  of 
functions  computable  by  artificial  neural  networks,  specifically  feed-forward,  single 
hidden-layer,  perceptron  ANNs  with  hard-limiters  at  each  node.  Moreover,  he  uses 
this  to  answer  the  question:  How  many  perceptron  nodes  are  required  to  guarantee 
shattering  the  sets  and  how  many  training  samples  are  required  for  adequate  learning? 

2.4  Sontag:  Alternative  Classification  Capabilities  Quantifiers 

The  research  that  is  described  in  this  section  addresses  two  questions  that 
were  begging  to  be  answered  in  Baum’s  work.  One  is:  What  if  a  standard  sigmoid 
function  is  used  at  each  node  instead  of  the  Heaviside  function?  The  other  question 
is:  Is  V-C  dimension  the  appropriate  measurement  of  ANN  capability  and  if  not, 
what  is?  In  fact,  Sontag  shows  that  using  a  continuous  sigmoid  instead  of  a  hard- 
limiter  appears  to  double  the  neural  network  capabilities.  A  side  result  of  Sontag’s 
is  that  including  a  direct  connection  also  improves  the  capability  of  an  ANN  if  it  is 
solving  a  classification  problem  but  not  if  it  is  solving  an  interpolation  problem.  The 
theme  here  is  to  evaluate  some  of  the  nuances  of  certain  artificial  neural  network 
architectures  in  terms  of  V-C  dimension.  However,  there  is  more.  Sontag  raises  an 
important  point  that  is  the  basis  of  the  new  research  described  in  this  dissertation. 
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Sontag’s  point  is  that  V-C  dimension  is  not  always  the  appropriate  quantifier  of 
ANN  capabilities.  (All  of  the  major  results  in  this  section  are  drawn  from  Sontag 
(39).) 


2.4-1  Sontag’s  Notation  and  Definitions.  Recall  previous  definitions  of 
dichotomy  of  a  set  X,  shattering,  and  V-C  dimension  of  a  set  of  functions  IF  (see 
section  2.1).  Consider  the  following  definitions  of  other  mappings  of  capability  which 
are  based  on  V-C  dimension. 

Definition.  Let  be  a  set  of  scalar-valued  functions  defined  on  Define  the 
mapping  'fi{F')  to  be  the  largest  integer  n  >  1  (possibly  oo)  such  that  there  is  at 
least  one  set  X  of  cardinality  n  in  which  can  be  shattered  by  IF. 

Definition.  Let  .F  be  a  set  of  scalar-valued  functions  defined  on  R*^.  Define  the 
mapping  li{F')  to  be  the  largest  integer  n  >  1  (possibly  oo)  such  that  every  set  X 
of  cardinality  n  can  be  shattered. 

The  utihty  of  both  of  these  quantifiers  should  be  evident.  Note  that  the  first 
mapping  is  the  V-C  dimension  less  one,  that  is  'p{iF)  =  VC{iF)  —  1  for  any  T.  The 
second  mapping  appears  like  a  more  appropriate  quantifier  for  determining  ANN 
capability  since  it  is  the  cardinality  of  sets  in  which  all  sets  are  guaranteed  to  be 
“shatterable” ,  not  just  one  set.  Both  of  these  mappings  are  fairly  extreme,  however. 
Hence,  Sontag  suggests  another  quantifier  which  is  a  more  robust  version  of  £. 

Definition.  Let  4F  he  &  set  of  scalar-valued  functions  defined  on  R'*.  Define  the 
mapping,  fiiF"),  to  be  the  largest  integer  n  >  1  (possibly  oo)  for  which  the  set  of 
sets  that  can  be  shattered  by  is  dense,  in  the  sense  that  given  every  n-element  set 
X  =  {xxX2, . . .  ,x„},  there  are  points  arbitrarily  close  to  their  respective  x,-  such 
that  X  =  {xl,  x^, . . . ,  X;}  can  be  shattered  by  F. 

Note  that  this  is  a  topological  method  of  assuring  data  points  are  in  general 
position.  Also,  note  that  I^{F)  <  IJ>{F)  <fi{F). 
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As  an  example,  let  T  be  the  set  of  affine  functions  on  :  /(x)  =  ax\  +  6x2  +  c, 

i.e., 

^  =  {/  ;  ^  I  /(x)  =  axi  +  6x2  +  c  ,  a,b,c  E  R}. 

Any  dichotomy  of  a  set  of  three  points,  which  are  not  colHnear  in  R^,  can  be  separated 
by  a  line.  Hence,  3  <  However,  the  famous  XOR  problem  is  an  example  of  a 

case  of  4  points  which  can  not  be  shattered.  Hence,  ii{T)  =  3.  In  fact,  there  is  no 
set  of  4  points  which  can  be  shattered  which  implies  =  3.  Finally,  if  there  is  a 
set  of  3  points  which  cannot  be  shattered  (3  collinear  points),  then  £(.F)  =  2. 

2.4-2  Sontag’s  Classification  Results.  Given  an  artificial  neural  network 
with  d-dimensional  inputs,  k  hidden  nodes  with  sigmoid  functions,  0,  on  each  hidden 
node,  then  F  is  specifically  defined  as  described  by  Equation  3.  Let  fi{k,  0,d)  denote 
fJ-iF),  where  F  is  the  set  of  all  (k,  d)-ANNs  defined  by  F.  Similar  notation  will  be  used 
for  fi  and  fi.  If  direct  connections  are  included,  the  notation  will  be  n^{k,0,d).  The 
superscript  d  denotes  the  existence  of  a  direct  connection.  Again,  similax  notation 
will  be  used  for  £  and  fl. 

The  main  results  of  classification  capabilities  in  terms  of  established  bounds 
on  the  three  quantifiers  will  be  presented  here.  First,  consider  the  following  two 
lemmas  that  will  ease  notation  and  provide  immediate  evidence  of  the  main  results. 
The  proofs  of  these  lemmas  can  be  found  in  the  reference  (39:26). 

Lemma  3  For  each  k,  0,  d,  fi{k,0,d)  =£(fc,  0, 1)  and  fi’^{k,0,d)  =  £‘^(A:,  0, 1). 

Lemma  3  provides  the  independence  of  input  dimension  in  the  results  in  The¬ 
orem  4  below.  That  is  £(fc,  0,  d)  =  fi{k,  0)  and  (i^{k,  0,  d)  =  £‘^(fe,  0) 

Lemma  4  For  any  sigmoid  0,  and  for  each  k  and  d, 

fi{k  -j-  l,0,d)  >  n‘^{k,  7i,  d) 
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and  similarly  for  and  fj,. 

The  main  results  are  given  in  Theorems  4,  5,  and  6  below.  These  theorems 
establish  bounds  on  each  of  Sontag’s  V-C  based  mappings.  Each  result  and  proof 
can  be  found  in  (39). 

Theorem  4  For  any  sigmoid  6,  and  for  each  k, 

=  fc  +  1 
=  2k +  2 
fjfk^O)  >  2k. 

Theorem  5  For  each  k, 

4[|J  <  n{k,n,2)  <  2k +1 

n\k,H,2)  <  4A:  +  3. 

The  first  inequahty  in  Theorem  5  follows  from  Baum  (9).  Actually,  the  bound 
established  by  Baum  is  for  fj,{k^7i,d)  (and,  therefore,  for  Ji  also)  for  all  d. 

Theorem  6  For  any  sigmoid  6,  and  for  each  k, 

2k +  1  <  7l(fc,W,2) 

4A:  +  3  <  7<‘^(fc,?f,  2) 

4A:  —  1  <  7Z(A;,  0, 2). 

Because  of  Lemma  4,  the  last  statements  of  Theorems  4  and  6  are  consequences 
of  the  previous  two. 

There  are  two  interesting  behaviors  discovered  by  Sontag.  Notice  the  apparent 
effect  of  a  direct  connection  on  the  value  of  all  of  the  mappings.  In  each  case, 
the  upper  bounds  approximately  doubles.  With  the  use  of  a  standard  sigmoid,  the 
mappings  exhibit  similar  behavior. 
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2.5  Other  Relevant  Research 


This  section  is  included  to  emphasize  the  on-going  struggle  to  capture  the 
essence  of  artificial  neural  network  capabilities  through  the  Vapnik-Chervonenkis  di¬ 
mension.  These  efforts  are  produced  for  the  purposes  of  getting  a  handle  on  required 
amounts  of  training  data  and  the  size  of  the  net,  which  appear  to  be  intricately 
related.  Moreover,  with  evolving  research,  a  new  technique  is  born  for  establish¬ 
ing  bounds.  Studying  these  techniques  will  prove  useful  for  answering  questions  that 
arise  from  the  ideas  proposed  in  this  dissertation.  Consider  the  following  testimonies 
of  continued  V-C  dimension  research. 

2.5.1  V-C  Dimension  of  Multi- Hidden-Layer  ANNs.  Peter  L.  Bartlett,  has 
recently  attacked  the  problem  of  finding  lower  bounds  of  V-C  dimension  for  “multi- 
hidden-layer”  ANNs  (8).  His  results  apply  to  the  feed-forward  architecture  with  the 
Heaviside  function  as  the  sigmoid  function  for  each  node  and  the  final  output  of 
the  ANN  being  binary.  For  a  two  hidden-layer  ANN,  the  V-C  dimension  is  at  least 
equivalent  to  the  number  of  connections  from  the  input  layer  to  other  units  plus  one. 
For  a  three  hidden-layer  ANN,  the  results  are  given  in  the  following  theorem. 

Theorem  7  (8)  Let  M  denote  the  set  of  functions  represented  by  a  three-layer, 
completely  connected  architecture  with  ko  >  0  input  units,  ki  >  0  first  hidden-layer 
units,  ^2  >  0  second  hidden-layer  units,  and  a  single  output  unit  where  ko,  k\,  k2  G  N. 

(a)  If  ko  >  ki,  and  ^2  <  ki/2  +  1),  then 

VC{M)>koki-\-ki{k2-l)-\-l. 

(b)  Ifl<ko<  ki,  k2  <  ki,  then 

VC{M)  >  koki  -\-  ki{k2  -  l)/2  +  1. 
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The  proof  of  this  theorem  was  not  included  in  that  paper.  However,  an  expla¬ 
nation  of  his  counting  arguments  was  included.  Bartlett’s  reasoning  is  based  on  a 
defining  set  for  a  unit,  u. 

Definition.  A  set  X  =  {xi,a;2,...,a;„}  C  R*’®  is  a  defining  set  for  a  unit,  u,  in  a 
feed-forward,  multi-hidden-layer,  perceptron  ANN  with  A:o-dimensional,  real-valued 
inputs  if; 

•  The  points  in  X  can  be  classified  in  each  of  distinct  ways  by  slightly 

perturbing  the  weights  and  threshold  of  unit  u. 

•  Slightly  perturbing  the  weights  and  threshold  of  units  other  than  u  will  not 
affect  the  classification  of  the  points  in  X. 

Definition.  A  point  x  £  is  an  oblivious  point  for  the  network  if  the  classification 

of  X  is  unaffected  by  sufficiently  small  perturbations  of  the  network  weights. 

The  next  theorem  lays  the  ground  work  for  Bartlett’s  main  results.  It  is  im¬ 
portant  to  note  that  this  theorem  is  based  on  the  existence  of  defining  sets  and  an 
oblivious  point. 

Theorem  8  Let  M  be  the  set  of  functions  represented  by  a  feed-forward,  multi- 
hidden-layer,  perceptron  ANN.  Consider  a  set  of  processing  units  U  in  this  architec¬ 
ture  and  assume  M  has  an  oblivious  point.  If  there  is  a  finite  defining  set  Su  for 
each  unit  u  in  U,  then 

VC{M)  >  X]  card  (5„)  +  1. 

ueu 

Bartlett  notes  that  the  result  implies  that  the  sample  size  must  increase  at  least 
linearly  with  the  number  of  weights  to  guarantee  the  data  set  can  be  implemented. 
He  also  notes  that  these  lower  bounds  hold  for  architectures  with  sigmoids.  However, 
based  on  Sontag’s  work,  there  is  a  good  chance  that  this  lower  bound  for  sigmoids 
could  be  tightened. 
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2.5.2  Hints  and  the  V-C  Dimension.  In  May  1993,  Abu-Mostafa  published 
a  paper  that  used  V-C  dimension  analysis  to  prove  the  benefits  of  incorporating  hints 
into  the  learning  process  (2).  Hints  are  known  properties  about  the  function  that  is 
being  approximated  by  the  artificial  neural  network.  Hints  reduce  the  class  of  pos¬ 
sible  functions  that  match  the  known  examples.  When  using  hints,  it  is  appropriate 
to  include  the  information  in  the  analysis  of  training  data  requirements  and  ANN 
architecture  size  through  V-C  dimension. 

The  major  result  of  this  paper  is  that  the  introduction  of  hints  does  affect  the  V- 
C  dimension.  To  show  this,  Abu-Mostafa  defines  a  new  quantity  to  represent  the  V-C 
dimension  of  a  hint.  In  actuality,  this  is  research  written  from  the  learnability  side  of 
the  capabilities  of  ANNs,  and,  although  it  is  related  to  the  subjects  in  this  document, 
the  details  are  not  directly  related.  The  point  of  including  the  paper  here  is  that  it  is 
yet  another  example  of  researchers  using  the  well-established  tool  (V-C  dimension)  to 
show  the  capability  benefits  of  a  customized  feature  of  an  architecture.  An  additional, 
interesting  point  of  Abu-Mostafa’s  work  is  that  it  required  the  customization  of  the 
definition  of  V-C  dimension. 

Abu-Mostafa’s  research  lays  additional  groundwork  for  considering  more  rad¬ 
ical  alterations  of  the  mappings  to  serve  the  purpose  of  customizing  the  capability 
quantifiers  for  specific  problem  sets.  This  is  the  central  premise  of  this  research 
outlined  by  this  dissertation. 

2. 6  Conclusions 

In  summary,  this  chapter  has  provided  enough  literature  review  to  put  into 
perspective  the  problems  associated  with  quantifying  capabilities  of  artificial  neural 
networks  to  generalize  about  classification  data.  In  particular,  this  chapter  concen¬ 
trated  on  the  feed-forward,  single  hidden-layer,  perceptron  artificial  neural  network. 
The  background  material  was  presented  specifically  to  technically  motivate  the  ap¬ 
proach  used  in  the  new  research  in  Chapters  IV,  V,  VI  and  VII. 
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III.  A  Formula  for  Evaluating  the  V-C  Dimension  of  Artificial 

Neural  Networks 

In  this  chapter,  there  is  an  in-depth  investigation  of  existing  proofs  of  eval¬ 
uations  of  the  Vapnik-Chervonenkis  (V-C)  dimension  as  stated  by  Baum  (9)  and 
Sontag  (40).  Proofs  of  stronger  results  will  be  given.  Moreover,  for  fixed  cardinality 
of  a  signed  set,  a  formula  for  determining  the  required  number  of  perceptron  nodes 
in  the  hidden-layer  of  a  feed-forward,  single  hidden-layer  perceptron  ANN  is  given. 

Additionally,  new  insight  into  the  duality  of  answering  ANN  capability  ques¬ 
tions  are  presented.  The  primal  problem  can  be  stated  as  follows:  Given  an  ANN 
architecture,  what  is  the  maximum  cardinality  of  the  classification  problem  which  can 
be  shattered  with  that  architecture?  The  dual  problem  can  be  stated  as  follows:  Given 
a  classification  problem,  what  is  required  of  an  ANN  architecture  in  order  to  provide 
an  implementation?  Intuitively,  there  would  appear  to  be  an  inverse  relationship. 
The  details  of  this  relationship  are  also  stated  in  this  chapter. 

3.1  Clarifications  and  Extensions  of  Baum’s  Research 

3.1.1  Baum’s  research  revised.  An  unstated  assumption  of  Baum’s  work 
(outlined  in  Section  2.3),  is  that  the  chambers  of  the  space,  R*^,  that  are  created 
by  the  hyperplanes  associated  with  each  hidden  node  can  be  represented  by  a  feed¬ 
forward,  single  hidden-layer,  perceptron  ANN.  Consider  the  function 

k 

F{.x)  =  Y^UiH{vi  •  x-Ti), 

«=i 

If  (jJi  €  {0,1},  then  these  weights  are  thought  of  as  logic  variables,  i.e.,  they  indicate 
one  side  of  a  hyperplane  or  the  other.  Hence,  every  point  in  (not  on  a  hyperplane) 
can  be  identified  with  a  unique  chamber  by  the  set  of  values  {wi,  u;2,...,Wfc}. 
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Additionally,  Baum’s  work,  related  to  the  primal  problem,  assumes  the  “worst 
case”  arrangement  of  signed  sets,  geometrically,  in  order  to  prove  Lemma  3.  By 
refining  “worst  case”  logic,  stronger  formulas  for  evaluating  the  V-C  dimension  can 
be  achieved.  Consider  the  following  new  refinement  to  Lemma  2  and  Theorem  3. 
Let  P(n,  Ti.,  d)  denote  the  minimum  number  of  ANN  hidden-layer  nodes,  with  the 
Heaviside  function,  7i,  as  the  sigmoid,  required  to  guarantee  that  the  ANN  can 
correctly  classify  an  arbitrary  arrangement  and  coloring  of  n  vectors  in 

Theorem  9  Given  n  vectors  in  which  are  in  general  position,  then 

for  n  even 
for  n  odd. 

Proof  Let  X  be  a  finite  set  in  R*^  with  card(X)  =  n.  Let 

be  a  dichotomy  of  X.  Form  line  segments  that  connect  each  point  in 
X'^  with  its  nearest  neighbor  in  X~ ,  and  vice  versa.  Note  that  a  “worst  case” 
arrangement  of  would  result  in  n  line  segments  if  n  is  even.  If  n  is  odd, 

then  the  “worst  case”  arrangement  would  result  in  only  n—\  such  line  segments.  This 
is  due  to  card(A’+)  <  card(A’~)  or  card(A'“)  <  card(A'‘'‘)  resulting  in  a  redundant 
line  segment. 

The  problem  of  implementing  the  dichotomy  of  the  signed  set  is  reduced  to 
placing  hyperplanes  in  the  space  so  that  each  line  segment  is  intersected.  In  d 
dimensions,  a  hyperplane  can  not  be  guaranteed  to  cut  more  than  d  line  segments. 
Therefore,  in  the  case  n  even,  hyperplanes  are  required  to  intersect  these  line 
segments.  In  the  case  of  n  odd,  hyperplanes  are  required.  This  provides 

the  necessary  condition.  Theorem  3  gives  the  sufficient  condition.  In  other  words, 
hyperplanes  axe  sufficient  to  implement  the  dichotomy  for  even  n,  and 
hyperplanes  are  sufficient  for  odd  n.  □ 
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n  -  even  n  -  odd 

Figure  2.  Even  and  odd  ngons. 


This  proof  is  a  slight  modification  of  Baum’s  proof  of  Lemma  2  (9).  See  Figure 
2  for  a  graphical  view  of  the  proof  for  d  =  2.  Figure  2  illustrates  the  difference  in 
ANN  requirements  given  an  odd  number  of  points  versus  an  even  number  of  points. 

Theorem  9  has  provided  a  closed  form  relation  between  the  cardinality  of  signed 
sets  to  be  separated  and  the  number  of  processors  in  an  ANN  hidden-layer  required 
to  perform  the  separation.  Without  breaking  the  cases  into  even  and  odd  n,  equality 
could  not  be  established.  This  seems  to  be  a  small  improvement  to  Baum’s  work. 
However,  it  provided  an  opportunity  to  investigate  a  formula  for  evaluating  V-C 
dimension. 

3.1.2  Exact  Values  for  V-C  Dimension.  Theorem  9  provides  a  value  for 
the  number  of  ANN  processors  required  to  solve  an  arbitrary  two- class  classification 
problem.  The  dual  issue  would  be:  What  is  the  capability  of  an  ANN  with  a  fixed 
number  of  hidden-layer  nodes  to  solve  two-class  classification  problems"?  Alterna¬ 
tively,  the  question  asks:  What  is  the  value  of  the  V-C  dimension  of  a  feed-forward, 
single  hidden-layer,  perception  ANN?  This  issue  can  now  be  addressed  directly  with 
a  formula. 
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Let  N{p,'H,d)  denote  the  maximum  cardinality  of  a  signed  set  in  inde¬ 
pendent  of  coloring  which  can  be  guaranteed  to  be  separated  by  p  processors  (with 
associated  Heaviside  functions  as  the  sigmoid)  in  the  hidden-layer  of  a  feed-forward, 
single  hidden-layer,  perceptron  ANN.  Note  that  iV(p,  Tf,  d)  is  the  V-C  dimension  of 
where 

J^^={F:  F{x)  =  ■  X  -  Ti)] 

i=l 

for  LOi  G  {0, 1},  Vi^  G  R^,  and  G  R. 

Theorem  10  Given  p  processors  in  the  hidden-layer  of  a  feed-forward^  single  hidden- 
layerj  perceptron  ANN^  then 

pd  +  1  for  p  even,  d  €  N 
N{p,  W,  d)  =  \  pd  -h  1  for  p  odd,  d  even 
pd  for  p  odd,  d  odd. 

Proof.  The  proof  requires  to  simply  appeal  to  Theorem  9  to 
confirm  that  the  values  can,  in  fact,  be  achieved. 

Case  1  (p  even,  d  even):  Is  P{pd  -\- 1,  PL,  d)  =  pi  Note  that  p  even  and  d  even 
implies  that  pd  is  even.  Hence,  pd  -|- 1  is  odd.  Therefore, 

P{pd+l,H,d)  = 

=  P- 

Case  2  {p  even,  d  odd):  Is  P{pd  +  1,7^,  d)  =  p?  Note  that  p  even  and  d  odd 
implies  that  pd  is  even.  Hence,  pd  +  1  is  odd.  Therefore, 

P{pd+l,H,d)  = 

=  P- 
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Case  3  {p  odd,  d  even);  Is  P{pd  +  l,H,d)  =  p?  Note  that  p  odd  and  d  even 
implies  that  pd  is  even.  Hence,  pd  +  1  is  odd.  Therefore, 

Pipd+l,H,d)  = 

=  P- 

Case  4  (p  odd,  d  odd):  Is  P{pd,  H,  d)  =  p?  Note  that  p  odd  and  d  odd  implies 
that  pd  is  odd.  Hence,  pd  +  1  is  even.  Therefore, 

P{pd,H,d)  =  \^' 


=  p.  □ 

Note  that,  for  the  particular  set  of  ANN  architectures  described  by  N{p,H,d) 
is  the  value  of  the  V-C  dimension  of  the  ANN. 

S.2  The  Relationship  of  P{n,7i,d)  and  N{p,H,d) 

The  results  given  in  Theorem  9  and  Theorem  10  appear  to  present  solutions 
to  questions  that  are  duals  of  each  other,  i.e.,  the  results  are  inverses  of  each  other. 
If  this  were  the  case,  there  would  be  a  direct  relationship  between  the  maximum 
cardinality  of  a  classification  problem  that  can  be  implemented  by  a  given  number 
of  processors  and  the  number  of  processors  required  to  guarantee  the  implementa¬ 
tion  of  a  given  cardinality  of  an  arbitrary  arrangement  of  points.  Intuitively,  this 
makes  sense  and  could  be  valuable  when  actually  applying  ANNs  to  solve  a  problem. 
However,  guaranteeing  a  solution  of  an  arbitrary  set  inhibits  the  relationship. 

Consider  the  following.  Clearly,  P(N{p,H,d),H,d)  =  p  since  this  is  how  the 
results  in  Theorem  10  were  derived.  In  other  words,  P{n,H,d)  was  evaluated  at 
n  =  N(p,H,d).  However,  N{P{n,H,d),H,d)  ^  n  necessarily.  Consider  n  =  4  and 
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(i  =  2.  By  Theorem  9, 


i>(4,W,2)  =  fl' 

=  2. 

However,  by  Theorem  10, 

iV(P(4,?^,2),?i,2)  =  N{2,n,2) 

=  2-2  +  1 
=  5^4. 

This  shows  that  P  and  N  are  not  inverses  as  would  be  expected.  In  particular,  P  is 
the  left  inverse  of  iV,  i.e.,  P  o  N  =  I.  However,  N  o  P  ^  /;  i.e.  P  is  not  the  right 
inverse  of  N.  This  conundrum  originates  in  the  semantics  of  the  definitions  of  N  and 
P.  Neither  of  the  values  in  the  above  example  are  incorrect.  They  are  simply  results 
obtained  due  to  approaching  the  problem  slightly  differently,  albeit  appropriately 
for  their  purposes.  Had  P  and  N  been  inverses,  there  would  have  been  a  stronger 
mathematical  relationship  between  the  number  of  hyperplanes  required  to  guarantee 
the  implementation  of  a  dichotomy  and  the  quantified  notion  of  ANN’s  capability  to 
generalize  about  classification  data.  Consequently,  the  goal  to  build  a  relationship 
between  the  two  questions  of  capability  and  requirements  goes  awry.  The  blame 
belongs  to  centering  the  quantifiers  around  the  notion  of  arbitrary  dichotomies  of 
arbitrary  arrangements.  Hence,  the  stage  is,  once  again,  set  to  investigate  a  radically 
different  approach  to  measuring  the  capabilities  of  ANNs. 

3.3  Conclusions 

In  summary,  this  chapter  has  presented  strengthened  results  of  Baum’s  work 
which  lead  to  a  formula  for  the  V-C  dimension  of  feed-forward,  single  hidden-layer, 
perceptron  artificial  neural  networks.  Additionally,  there  was  a  discussion  of  the 
duality  of  ANN  capability  analysis  and  how  it  relates  to  a  possible  inverse  relation¬ 
ship  of  the  solutions.  The  fact  that  there  is  no  complete  inverse  relationship  serves 


as  additional  motivation  for  the  requirement  of  an  approach  to  measuring  ANNs 
capability  to  generalize  about  classification  data  other  than  V-C  dimension  based 
approaches. 
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IV.  New  Approach  to  Determining  Artificial  Neural  Network 

Capabilities 

There  are  two  issues  addressed  in  this  chapter.  Both  are  integral  to  the  design 
and  proper  implementation  of  certain  artificial  neural  network  (ANN)  architectures 
to  solve  classification  problems.  The  first  is  an  investigation  of  how  ANN  capabilities 
should  be  assessed.  Note  that  the  emphasis  is  not  on  what  the  capabilities  are, 
although  there  should  be  some  interesting  results,  but  rather  on  designing  a  mapping 
with  the  specific  characteristics  of  ANNs  considered.  The  second  issue  follows  from 
the  first.  Once  a  more  informative  mapping  has  been  established,  how  can  that 
mapping  help  determine  the  efficient  use  of  ANNs  to  solve  classification  problems. 
In  other  words:  Is  there  any  utility  in  using  an  ANN  to  provide  a  solution  or  does  it 
require  too  many  parameters  to  be  learned  from  a  finite  training  set? 

This  chapter  will  motivate  why  capabilities  analysis  should  be  directed  at  a 
particular  classification  problem.  Specifically,  the  quantifiers  will  be  defined  that 
have  properties  required  to  determine  ANN  capabilities.  In  order  to  evaluate  these 
quantifiers,  the  complexity  of  the  problem  must  be  characterized  along  with  a  char¬ 
acterization  of  the  capability  of  the  sets  generated  from  the  ANN.  This  motivates 
the  research  given  in  following  chapters.  Ultimately,  the  ANN  capability  quantifier 
should  be  about  colored  arrangements  of  data  instead  of  arbitrary  arrangements  like 
V-C  based  quantifiers.  In  the  end,  this  required  a  complete  abandonment  of  the  V-C 
approach. 

4.1  V-C  Based  Quantifiers  Defined 

Before  defining  new  quantifiers,  the  V-C  dimension  based  quantifiers  will  be 
defined  in  the  new  notation.  This  will  facihtate  the  analysis  of  the  pitfalls  of  V-C 
based  quantifiers.  Recall  the  following  notation.  Let  X  C  Y  a  finite  subset  of 
X,  and  fl  a  collection  of  subsets  of  X.  Let  A^(y)  be  the  number  of  distinct  sets 
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0  nY  for  all  0  E  fi.  That  is, 

A“(r)  =  card({0  n  y :  0  e  fi}). 

Also  let  Si  denote  the  set  of  all  subsets  of  X  with  cardinality  1.  That  is,  given  /  E  N 

S,  =  {rcX:  card(F)  =  /}. 

Now,  V-C  dimension  and  Sontag’s  mappings  will  be  defined  using  the  above 
notation. 

Definition.  Given  /  €  N  and  fl  defined  above,  let 

m“(/)  =  sup{A“(y)  :  F  G  5/}. 

Note  that  the  set  {A^(F)  :  F  €  Si}  is  finite  for  each  /  €  N.  Hence,  for  all  F  €  Si, 
A^(F)  is  bounded  above  by  2‘  and  below  by  0.  Therefore,  the  supremum  and  infimum 
exist  (are  finite). 

Definition.  Given  0,  define 

VC(Q)  =  inf{/  G  N  :  m“(/)  =  2'} 

Definition.  Given  fl,  define 

7Z(fi)  =  sup{/GN:m^(0  =  2^} 

Note  that  VC(i})  =  Ji(fl)  =  +oo  if  m^(/)  =  2‘  for  all  /  G  N.  Note  that  the  V-C 
dimension  and  "Ji  are  not  equal.  In  fact,  =  VC{^1)  —  1  for  an  arbitrary  set  of 
sets  Ct.  Figure  3  shows  graphically  the  relationship  between  V-C  dimension  and  JI. 
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Similarly,  under  bar  symbols  can  be  defined  to  yield  equivalent  definitions  of 
Sontag’s  £. 

Definition.  Given  /  €  N  and  defined  above,  let 

m^(/)  =  inf{A“(r)  :  V  G  5/}. 

Definition.  Given  fl  defined  above,  define 

fi(Q)  =  sup{?  €  N  :  2Z1^(0  =  2^} 

Again,  £(fi)  =  +oo  if  m^(/)  =  for  all  f  G  N. 

Now,  with  even  more  notation,  a  mathematical  definition  of  ^  can  be  stated. 
In  order  to  provide  this  definition,  it  was  determined  that  Sontag’s  definition  of  sets 
being  close  was  actually  appealing  to  the  Hausdorff  metric.  The  Hausdorff  metric  of 
two  finite  subsets  of  X,  A  and  B,  is  defined  by 

h{A,  B)  =  m&x{d{B,  A),  d{A,  5)}, 


where 

d(A,  B)  =  max{mn  ||a  -  6||}.(7:34) 

aSA  b^B 

Consider  the  following  decomposition  of  Si  into  disjoint  subsets  by  defining 
Aif  to  be  the  set  of  subsets  of  X  with  cardinality  I  which  can  be  shattered  by  12, 
and  Afp  to  be  the  set  of  subsets  of  X  with  cardinality  I  which  cannot  be  shattered 
by  f2.  Specifically, 

Mf  =  {YeSi:  A^(F)  =  2'} 

and 

Afp  =  {y  e  :  A“(y)  <  2'} . 
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Then,  Si  =  Mf  U  and  Mf  U  J\fp  =  0  for  each  /  e  N. 
Definition.  Given  0,  define 

fi{Cl)  =  max{/  €  N  :  Mf  is  dense  in  Si}, 


where  is  dense  in  Si  if  given  e  >  0  and  S  ^  Si,  there  exists  S  G  Atp  such  that 
h{S,S)  <  e. 


4.2  New  Quantifiers  Based  on  V-C  Dimension 

4.2.1  a  Shattering.  One  attempt  to  “weaken”  V-C  based  quantifiers  (/x, 
IX,  and  /Z)  is  to  change  how  shattering  is  addressed.  The  first  alteration  to  be  made 
deals  with  the  fact  that  all  of  the  previous  quantifiers  are  based  on  achieving  every 
dichotomy.  This  is  fairly  restrictive.  It  is  not  always  necessary  to  achieve  every 
dichotomy.  In  this  sense,  the  new  quantifiers  should  require  only  a  portion,  a  €  [0, 1], 
of  the  dichotomies  of  a  set  to  be  achieved.  Let  a  G  [0, 1]  and  define 


+00  if  ^G[a,l]V/GN 

=  ]  inf  {/  G  N  :  ^  G  [0,a]}  (7) 

0  if  ^G[0,a)V/GN 


1 


and 


-Poo  if  ^  G  [a,  1]  V  /  G  N 

=  {  sup{/  G  N  :  G  [o,  1]}  (8) 

0  if  ^  G  [0,a)  V/gN. 


I 


Theorem  11  The  following  properties  are  true  for  any  D  and  0,01,0:2  G  [0, 1). 


1.  Pi  =  71. 

2.  Vq  >  Ji. 

3.  El  =  p. 
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4.  Ea  >  a- 

5.  Va  >Ea- 

6.  If  Oi  <  Q!2,  then  77^^  <  and 


Proof.  Let  0  be  a  collection  of  subsets  of  X  C 


1.  If  a  =  1,  then 


vi{n)  = 


+00  if  ^  =  1  V  /  €  N 

inf{/eN:^e[0,l]} 

which  is  7l(fl). 

2.  Since 

inf{/  e  N  :  G  [0,^]}  >  sup{/  G  N  ;  =  2'} 

then  Va  >  Ji. 

3.  Follows  similarly  to  V. 

4.  Follows  similarly  to  u. 

5.  By  definition  of'm^(/)  and  mPd). 

6.  Let  aj  <  02,  then  [0,  Oi]  C  [0,  a:2]-  Hence, 

inf{/  G  N  :  G  [0,ai]}  <  inf{/  6  N  :  g  [0,^2]}. 

Therefore,  i/a^  <  .  Proof  of  <  1^  is  similar. 


□ 


See  Figures  3  and  4.  These  figures  illustrate  the  relationship  between  V-C 
based  quantifiers  and  the  new  quantifiers. 

For  the  “i/^-equivalent”  to  /x,  more  notation  is  needed. 
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Definition.  Given  /  €  N  and  e  >  0,  define  the  e-ball  centered  at  the  set  5  G  <S/  as 

Bi{S,e)  =  {T  eSr.h{T,S)<€}. 

Hence,  Bi{S,  e)  is  the  set  of  sets  of  cardinality  /  that  are  within  e  to  5  with  respect 
to  the  Hausdorff  metric.  Now  the  definition  of  can  be  defined. 

Definition.  Given  /  G  N  and  0,  Bi{S,  e)  defined  above,  let 

mp(0  =  inf{A“(r)  :  T  G  5,(5,  e)}. 

Now,  given  e  and  a  G  [0, 1] ,  define 

-1-00  if  ^1^^  G  [a,  1]  V  /  G  N 

inf{/GN:^G[0,a]}  (9) 

0  if  ^G[0,a)V/GN 

4-2.2  Between  “At  Least  One”  and  “Every”  Set.  Another  deviation  from 
the  base  set  of  quantifiers  should  be  about  the  rigid  difference  between  ji  and  7i  and, 
likewise,  between  and  Ta.  For  to  have  the  value  /,  it  is  required  that  every  set 
of  cardinality  I  must  satisfy  the  conditions  prescribed.  For  Va  to  have  the  value  /,  it 
is  required  that  only  one  set  of  cardinality  I  must  satisfy  the  prescribed  conditions. 
This  is  an  extreme  difference.  In  an  attempt  to  “weaken”  the  mapping,  another 
parameter  is  incorporated.  Define  5f  as  some  “reduced  portion”  of  Si,  i.e.,  Sf  C  Si. 
Then,  redefine  rn^{l)  and  T7„  on  the  reduced  set  Sf .  Consider 

m^{l)  =  sup{A^(K)  :  Y  G  5,^} 

and 

m^{l)  =  inf{A“(y)  :  Y  G  Sf). 
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Now,  define 


Va,p{^)  -  { 


if  ^e[a,i]V/eN 


+00 

inf  |/  e  N  :  ^  € 

0 


[0,0;]  I 


if  4^e[0,a)V/eN 


(10) 


and 


EaA^)  =  ^ 


+  00 

sup{/  €  N  :  =1^  €  [a,l]} 


0 


if  ^  e  [a,  1]  V  /  e  N 


if  ^  €  [0,a)  V/eN. 


(11) 


Note  that,  in  a  sense,  fi  and  V  already  have  this  feature  since  sets  not  in  general 
position  are  not  considered.  So,  the  /5  parameter  just  extends  that  freedom  to 
sets  with  no  particular  characteristics  and,  hence,  a  version  of  v  incorporating  yd  is 
redundant. 


4.2.3  Inadequacies  ofvafi  and  Ea,0-  The  above  attempts  to  customize  V- 
C  based  quantifiers  for  ANN  capabilities  analysis  were  attempts  to  incorporate  the 
desired  properties  at  the  wrong  point  of  analysis.  They  are  attempts  to  characterize 
separating  surfaces  when,  in  fact,  this  customization  needs  to  be  transferred  to  the 
problem  set. 

Additionally,  it  is  clear  that  there  are  two  distinct,  but  obviously  related,  ques¬ 
tions  to  be  answered:  What  are  the  particular  requirements  of  an  ANN  architecture 
in  order  to  solve  a  given  classification  problem?  and  How  do  different  ANN  archi¬ 
tectures  compare  in  capability?  Consider  the  V-C  based  quantifiers.  Theoretically, 
they  provide  worst  case  notions  of  capabilities.  Chapter  III  provided  theorems  for 
inverting  these  concepts  to  get  worst  case  requirements.  How  useful  is  this  since  it  is 
the  worst  case  arrangements  of  data  being  considered?  The  crux  of  the  new  research 
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in  this  dissertation  is  that  it  would  be  helpful  to  perform  similar  analysis  on  the 
colored  arrangements  in  a  way  that  is  accessible  directly  for  implementation. 

4.3  Conclusions 

In  summary,  it  has  been  shown  that  the  difficulty  of  measuring  ANNs’  capa- 
bihty  is  partially  because  each  approach  discussed  here  has  sought  to  assign  a  value 
to  the  capability.  Alternatively,  consider  ordering  a  set  of  ANNs  based  on  capabil¬ 
ity.  This  would  provide  a  means  of  comparing  ANNs  based  on  capability  without 
having  to  assign  a  value.  Additionally,  it  should  be  clear  that  the  comparison  should 
be  based  on  an  ANNs’  ability  to  solve  specific  classification  problenis  not  arbitrary 
classification  problems. 
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V.  Artificial  Neural  Networks,  Combinatorial  Geometry,  and 

Lattice  Theory 

This  chapter  introduces  concepts  of  combinatorial  geometry  of  hyperplane  ar¬ 
rangements  and  lattice  theory  that  are  relevant  to  studying  capabilities  of  artificial 
neural  networks  (ANNs).  These  concepts  provide  the  basis  for  a  generalized  ap¬ 
proach  (derived  in  Chapter  VI)  to  quantifying  ANNs’  capabilities  based  on  invari¬ 
ants.  Roughly  speaking,  invariants  are  mathematical  objects  which  do  not  change 
given  certain  prescribed  transformations  of  their  domain.  These  invariants  impinge 
upon  many  aspects  of  the  geometry  of  the  arrangement.  Borrowing  from  these  ideas, 
ANN  capability  quantifiers  will  be  constructed  in  Chapter  VII.  To  analyze  some  in¬ 
variants  an  underlying  lattice  structure  is  required.  Therefore  the  lattice  structures 
of  ANNs  will  be  defined  in  this  chapter.  Finally,  it  will  be  shown  that  the  V-C 
dimension  of  a  feed-forward,  single  hidden-layer,  perceptron  ANN  is  equivalent  to 
the  number  of  chambers  of  the  hyperplane  arrangement  defined  by  the  ANN. 

A  majority  of  the  results  of  combinatorial  geometry  rehes  on  a  lattice  struc¬ 
ture.  Hence,  this  chapter  will  investigate  the  lattice  structures  of  feed-forward,  single 
hidden-layer,  perceptron  artificial  neural  networks.  Additionally,  lattice  theory  pro¬ 
vides  a  mechanism  for  comparing  sets.  Hence,  establishing  a  lattice  on  sets  generated 
ANNs  will  give  insight  into  how  architectures  compare  in  their  ability  to  generalize. 

Although  the  study  of  hyperplane  arrangements  is  a  relatively  young  field  of 
mathematics  (important  works  by  B.  Griinbaum  published  in  1971  (21)  and  by  Za¬ 
slavsky  in  1975  (50)),  there  is  a  wealth  of  theorems  and  algorithms  answering  current 
applied  mathematics  problems.  In  particular,  methods  for  counting  geometric  struc¬ 
tures  of  an  arrangement  such  as  edges,  faces,  and  chambers  have  been  established 
(3).  The  importance  of  these  counting  algorithms  for  measuring  ANN  capabilities  is 
discussed  in  this  chapter. 
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The  link  between  the  study  of  arrangements  of  hyperplanes  and  the  Vapnik- 
Chervonenkis  (V-C)  dimension  relies  on  chamber  counting.  Chamber  counting  is  a 
basic  objective  of  many  combinatorial  geometric  algorithms.  One  example  of  count¬ 
ing  is  based  upon  the  method  of  deletion-restriction  which  relies  on  the  Poincare 
polynomial  of  an  arrangement  of  hyperplanes  (36).  This  notion  is  intricately  related 
to  quantifying  the  capabilities  of  ANNs. 

It  is  important  to  note  that  results  obtained  here  are  exclusive  to  the  ANN 
architecture  described  in  Chapter  I.  Also,  all  combinatorial  geometric  results  in  this 
chapter  are  based  on  the  assumption  that  the  arrangement  of  hyperplanes  is  in 
general  position.  That  is,  if  ajiy  two  planes  have  a  common  line,  then  the  line  is 
distinct,  and  if  any  three  planes  have  a  common  point,  then  the  point  is  distinct. 

5.1  Combinatorial  Geometry  -  The  Basics 

In  a  broad  sense,  combinatorial  geometry  is  the  theory  of  arithmetic  invariants 
of  finite  sets  of  points  in  projective  space"^  (hO).  Some  important  invariants  depend 
upon  the  lattice  structure  of  the  subspaces  formed  by  a  point  set.  Of  particular 
importance  to  ANN  research,  are  point  sets  which  are  the  duals  of  an  arrangement 
of  hyperplanes.  Fortunately,  lattice  structures  can  be  defined  for  arrangements  of 
hyperplanes  (50),  making  the  rich  results  about  geometric  lattices  available  for  ANN 
analysis.  Geometric  lattices  are  lattices  that  satisfy  additional  conditions  requiring 
that  the  arrangement  be  central,  that  is,  each  hyperplane  contains  a  point  common 
to  all  of  the  hyperplanes  in  the  arrangement  (1). 

5.1.1  Lattice  Structures.  Lattice  theory  is  required  to  exploit  the  results 
of  combinatorial  geometry  for  ANN  analysis.  Lattice  theory,  in  general,  is  the  study 
of  orderings.  An  ordering  is  a  binary  relation,  ■<,  which  can  be  read  as  “is  contained 
in” ,  “is  a  part  of” ,  or  “is  less  than  or  equal” .  The  purpose  of  an  ordering  is  to  be 
able  to  compare  elements  of  a  set.  To  make  meaningful  comparisons,  an  ordering  is 


45 


required  to  satisfy  certain  properties.  The  number  and  type  of  properties  required 
dictates  directly  what  can  be  ascertained  from  the  comparison  of  elements  in  the  set. 

The  most  basic  properties  define  a  partially  ordered  set. 

Definition.  A  partially  ordered  set  is  a  set,  P,  together  with  a  binary  relation, 
which  satisfies,  for  all  x,y,z  G  P,  the  following  properties: 

1.  X  :<  X.  (Reflexive) 

2.  U  X  :<y  and  y  :<  x,  then  x  —  y.  (Antisymmetry) 

3.  If  X  :<y  and  y  z,  then  x  d  (Transitive) 

If  x  y  and  x  ^  y,  then  we  write  x  <  y.  Note  that  the  term  partially  is  used  to 
indicate  that  the  relation  d  is  not  necessarily  closed.  In  other  words,  not  all  elements 
of  the  set  are  required  to  be  comparable.  Hence,  there  may  he  x,y  6  P  for  which 
X  ^  y,  X  d  Vi  y  Additionally,  y  is  said  to  cover  x  if  x  ■<  y  and  there  is  no  ^  G  P 

such  that  X  -<  z  -<  y.  Also,  if  there  is  a  unique  element  z  £  P  such  that  z  d  ^  for  all 

X  G  P,  then  2r  is  called  the  zero  element  of  P. 

Definition.  Let  (P,  :^)  be  a  partially  ordered  set.  A  lower  bound  {upper  hound)  of 
a  subset  X  of  P,  is  an  element  a  ^  P  {b  £  P)  such  that  a  x  for  all  x  G  X  {x  d  b 
for  all  X  G  A”). 

Definition.  Let  (P,  ^)  be  a  partially  ordered  set.  A  greatest  lower  bound  {least 
upper  bound)  of  X  is  an  lower  bound,  a,  E  P,  (upper  bound  6  G  P)  such  that  add 
{  b  d  b)  for  all  lower  bounds  (upper  bounds)  of  X. 

Definition.  Let  {P,d)  be  a  partially  ordered  set.  Let  xi,X2,...,x„  G  P  such  that 
Xi  X2  ^  ...  ^  x„,  then  xi  X2  -<  ...  -<  x„  is  said  to  be  a  chain.  A  saturated  chain 
is  a  chain,  xi  -<  X2  -<  ...  -<  x^,  such  that  Xi+i  covers  x,  for  all  i  <  n.  The  length  of  a 
chain  is  defined  as  one  less  than  its  cardinality. (5: 14) 
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Definition.  Let  (P,  :^)  be  a  partially  ordered  set.  The  rank  function  of  an  element 
X  £  P,  r{x)  is  defined  as  the  maximum  length  of  all  saturated  chains  from  z  to  x 
where  z  =  glb(P).(5:14) 


Definition.  Let  (P,  be  a  partially  ordered  set.  The  meet  of  any  two  elements 
X,  y  £  P,  denoted  hy  x  Ay,  is  defined  as  the  greatest  lower  bound  of  the  set  {a:,?/}. 
That  is 


xAy^  glb{a;,?/}. 


The  join,  denoted  hy  xV  y,  as  the  least  upper  bound  of  the  set  {x,y}.  That  is 


X  y  y  =  lub{x,  y}. 


Note  that  the  definitions  of  the  meet  and  join  are  inherently  dependent  on  the 
ordering.  In  other  words,  the  ordering  defines  the  meet  and  join. 

Definition.  A  partially  ordered  set,  (P,  X),  with  operations  meet,  A,  and  join,  V, 
such  that  P  is  closed  with  respect  to  A  and  V  is  called  a  lattice  and  is  denoted 
(P,^,A,V). 

One  should  note  that  it  is  not  always  the  case  that  the  set,  P,  is  closed  with 
respect  to  the  meet  and  join  operations.  Hence,  we  have  the  following  definitions. 

Definition.  A  partially  ordered  set,  (P,  :^),  with  the  operation  meet.  A,  such  that 
P  is  closed  with  respect  to  A  is  called  a  meet  semi-lattice  and  is  denoted  (P,  A). 

Definition.  A  partially  ordered  set,  (P,  :^),  with  the  operation  join,  V,  such  that 
P  is  closed  with  respect  to  V  is  called  a  join  semi-lattice  and  is  denoted  (P,  V). 

The  term  semi-lattice  will  be  used  to  refer  to  a  join  semi-lattice  or  a  meet  semi-lattice. 

Now,  consider  the  properties  of  a  lattice  as  presented  by  the  following  lemma. 

Lemma  5  Let  (P,  i^.  A,  V)  be  a  lattice,  then,  for  all  x,y,z  £  P,  the  following  laws 
hold  true: 
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(1)  X  A  X  =  X,  xy  X  =  X.  (Idempotent) 

(2)  xAy  =  y  Ax,  xyy  =  yyx.  (Commutative) 

(3)  X  A  {y  A  z)  =  {x  A  y)  A  z,  X  y  {y  y  z)  =  {x  y  y)  y  z.  (Associative) 

(4)  X  A  {x  y  y)  =  X  y  {x  A  y)  =  X.  (Absorption) 

Moreover, 

(5)  X  :<y  X  Ay  —  X  and  x  :<  y  xy  y  =  y.  (Consistency)  (12) 

With  the  structures  described  above,  a  large  variety  of  combinatorial  problems, 
such  as  the  chamber  counting  problem,  can  be  addressed.  Combinatorial  problems 
can  often  be  restated  as  critical  problems  (17).  Critical  problems  bridge  lattice 
theory  to  combinatorial  geometry  of  hyperplane  arrangements.  Critical  problems 
can  be  solved  by  analyzing  the  characteristic  polynomial,  also  known  as  the  Poincare 
polynomial,  of  a  lattice  or  semi-lattice. 

It  win  be  shown  by  the  new  research  in  this  dissertation  that  hyperplane  ar¬ 
rangements  can  have  a  lattice  structure  and,  in  general,  have  a  semi-lattice  structure. 
Moreover,  the  number  of  chambers  created  by  an  arrangement  can  be  ascertained 
from  the  characteristic  polynomial  defined  on  the  lattice  (or  semi-lattice)  of  the 
hyperplane  arrangement.  The  novelty  of  this  approach  to  solving  geometric  prob¬ 
lems  is  that  the  solutions  are  arithmetic  invariants  described  by  the  characteristic 
polynomial  and  depend  only  on  the  geometry  of  the  original  point  set  (17). 

5.2  The  Theory  of  Arrangements  of  Hyperplanes 

Of  particular  interest  to  the  research  in  this  dissertation  are  the  results  of 
combinatorial  geometry  apphed  to  arrangements  of  hyperplanes.  This  is  yet  another 
relatively  new  area  of  rich  mathematics  that  is  pertinent  to  the  study  of  ANNs. 

Consider  a  finite  set  of  hyperplanes  (translated  subspaces  with  dimension  d—1) 
of  a  Euclidean  or  projective  d-dimensional  space  which  will  be  referred  to  as  an 
arrangement.  When  these  hyperplanes  are  removed,  the  remainder  of  the  space 
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is  partitioned  into  disjoint  subsets  known  as  chambers,  each  one  a  d-dimensional 
polyhedron  (not  necessarily  bounded).  The  arrangement  is  said  to  partition  the 
space  by  hyperplanes.  How  many  chambers  are  created  by  the  partition?  How  many 
vertices  result  from  the  hyperplane  intersections?  Are  two  arrangements  equivalent 
geometrically?  These  are  some  of  the  questions  that  the  theory  of  arrangements  of 
hyperplanes  seeks  to  answer  and  will  be  a  central  concept  in  the  development  of 
geometric  measures  of  ANN  capabilities. 

5.2.1  The  Cut-Intersection  Semi-Lattice.  Let  A  denote  an  arrangement  of 
hyperplanes  (so  /  6  A  is  a  particular  hyperplane).  The  cut-intersection  semi-lattice 
of  an  arrangement  is  actually  defined  on  the  intersections  of  the  hyperplanes  and  is 
denoted  L{A).  That  is, 

L{A)  =  I  f|  / :  ri  ^  ^  0  for  all  T  C  A 
[leT  leT 

An  element,  x  €  L{A),  could  be  any  ^-dimensional  translated  subspace,  for  k  € 
{0, 1,2, ...,  d  —  1),  contained  in  some  hyperplane  a  ^  A.  The  partial  ordering,  is 
chosen  to  be  reverse  set  containment.  That  is,  for  x,y  ^  T{A),  write  x  :<  y  ii  and 
only  ii  X  D  y.  Then,  (T(A),  :<)  is  a  partially  ordered  set.  (50) 

The  ordering  is  chosen  so  that  the  minimal  element  of  L{A)  is  the  entire  space 
containing  the  hyperplanes.  The  existence  of  this  minimal  element  is  important  for 
following  results.  Note  that  there  is  no  guarantee  of  the  existence  of  a  maximal 
element  which  is  expected  to  be  the  empty  set.  Recall  that  0  ^  L{A).  However,  with 
stronger  conditions  on  the  arrangement  A,  the  existence  of  an  maximal  element  can 
be  guaranteed. 

Now,  consider  the  definitions  of  meet  and  join  for  L{A).  since  the  ordering  is 
reverse  set  inclusion,  the  definitions  are  not  intuitive.  In  fact,  the  meet  operation, 
as  prescribed  by  the  definitions  in  Section  5.1.1,  is  not  a  closed  operation.  Hence, 
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L{A)  will  be  called  a  semi-lattice.  A  modified  definition  of  the  meet  is  given  so  that 
the  set  is  closed  with  respect  to  the  meet.  Define  the  modified  meet  operation  of 
X, y  e  L{A)  hy 

X  Ay  =  glb{x,y}  =  n{/  €  A  :  /  3  x  U  y}. 

The  join  operation  (as  defined  by  the  definition  in  Section  5.1.1)  of  x,y  €  L{A) 
becomes 

X  V  ?/  =  lub{x,  y]  =  xr\y. 

Note  that  xfl?/  is  not  necessarily  in  L{A),  since  xOy  could  be  empty.  Hence  L{A)  is 
not  closed  with  respect  to  the  join  operation.  It  has  been  shown  that  {L{A),  A,  V) 
satisfies  all  of  the  properties  of  Lemma  5  (50).  Hence,  (L(A),  A,  V)  will  be  referred 
to  as  the  cut-intersection  semi-lattice  of  an  arrangement  A.  Note  that  it  is  a  semi- 
lattice  since  L{A)  is  not  closed  with  respect  to  the  join  operation. 

Finally,  consider  the  lattice  structure  of  an  arrangement  that  is  said  to  be 
centered. 

Definition.  An  arrangement,  A,  is  centered  if 

leA 

Note  that  L{A)  defined  on  a  central  arrangement.  A,  is  assured  to  have  maximal 
element.  In  fact,  (L(A),  A,  V),  is  a  lattice  (with  the  prescribed  meet  operation,  not 
the  modified  meet  operation)  and  is  often  referred  to  as  a  geometric  lattice.  (36:24) 

5.3  Chamber  Counting  and  the  Poincare  Polynomial 

Chamber  counting  is  a  fundamental  combinatorial  geometry  problem.  Not  only 
does  chamber  counting  provide  valuable  insight  into  arrangements  (for  the  purposes 
of  this  research)  of  hyperplanes,  but  it  also  serves  as  a  benchmark  problem  for 
testing  algorithms  or  methods  much  like  the  map  coloring  problem.  In  fact,  there  is 
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a  significant  amount  of  research  dedicated  to  counting  methods  for  many  geometric 
entities  such  as  chambers,  faces,  vertices,  and  edges. 

In  1889,  S.  Roberts  published  breakthrough  research  that  included  a  very  sim¬ 
ple  formula  for  counting  chambers.  According  to  Roberts,  the  number  of  regions 
formed  by  an  arbitrary  arrangement  of  n  lines  in  the  Euclidean  plane  is  equal  to  the 
number  of  regions  formed  by  n  lines  in  general  position,  minus  the  number  of  regions 
lost  because  of  multiple  points,  minus  the  number  of  regions  lost  because  of  parallel 
lines.  This  formula  led  to  various  algorithms  for  determining  actual  counts,  one  of 
which  was  proposed  by  T.  Zaslavsky  (50)  in  1975.  Zaslavsky’s  approach  is  known 
as  the  method  of  deletion  and  restriction.  In  addition,  he  showed  that  the  recursive 
formula  used  for  the  method  of  deletion  and  restriction  produces  the  same  result  for 
counting  chambers  as  the  characteristic  polynomial  evaluated  at  one. 

5.3.1  The  Method  of  Deletion  and  Restriction.  Let  A  be  an  arrangement 
in  the  Euclidean  d-dimensional  space,  and  let  /  €  A  be  a  hyperplane. 

Definition.  The  deleted  arrangement  about  I,  is  defined  as  A'  —  A\{/}.  The  re¬ 
stricted  arrangement  about  I  is  defined  as  A"  =  {K  fl  /  |  iif  G  A'}. 

The  triple  (A,  A',  A")  can  be  used  to  solve  the  problem  of  counting  chambers. 
Let  C'(A)  denote  the  set  of  chambers  formed  by  A.  Zaslavsky  showed  that 

card(C'(A))  =  card(C'(A'))  +  card(C'(A")). 

To  prove  this  recursion,  let  P  be  the  set  of  chambers  in  C'(A')  that  intersects  the 
distinguished  hyperplane,  /,  and  let  Q  be  the  set  of  chambers  in  C'(A')  that  does  not 
intersect  1.  Obviously,  card(C'(A'))  =  card(P)  -|-  card((5).  Note  that  the  hyperplane 
I  divides  each  chamber  of  P  into  2  chambers  and  does  not  intersect  the  chambers  of 
Q.  Hence,  card(C'(A))  =  2card(P)  +  card((5).  In  fact,  there  is  a  bijection  between  P 
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and  C(A'')  given  by  C  — ^  COl.  Therefore,  card(C'(A"))  =  card(P),  which  provides 
the  recursion.  (50)  (36) 


5.3.2  The  Poincare  Polynomial.  Zaslavsky  also  showed  that  a  similar 
recursion  holds  for  the  characteristic  polynomial.  The  characteristic  polynomial  is  a 
specific  instantiation  of  the  Poincare  polynomial,  which  is  a  polynomial  defined  on 
the  geometric  structure  of  an  arrangement  and  is  used  for  the  analysis  of  invariants 
about  the  cut-intersection  semi-lattice,  L{A).  The  Poincare  polynomial  is  one  of  the 
most  important  combinatorial  invariants  of  an  arrangement.  Its  properties  provide 
insight  into  the  structures  of  an  arrangement.  The  Poincare  polynomial  is  defined 
based  on  the  Mobius  function  and  a  rank  function. 

The  Mobius  function,  defined  on  L{A),  is  a  binary  mapping  p  :  L{A)  x  L{A)  — *■ 
Z.  It  provides  a  characterization  for  arrangement  density,  by  characterizing  the  re¬ 
lationships  of  the  subsets  based  on  the  partial  ordering  defined  for  L{A).  In  general, 
there  is  not  an  exphcit  formula  for  p.  However,  for  fixed  x,  the  values  oi  p{x,  y)  may 
be  computed  recursively  as  follows.  For  x,y,z  ^  L{A), 


y-{x,y) 


1  \i  X  =  y 

<  -  E  2r)  if  X  :<  y 

0  else. 


(12) 


Note  that  if  a  function,  satisfies  the  above  properties,  then  ^  =  p.  (36:33).  That 
is,  p  is  uniquely  defined.  The  rank  function,  defined  on  L{A),  is  as  defined  in  Section 
5.1.1.  Note  that  in  the  case  of  L{A),  the  rank  function  on  x  €  L{A)  can  be  defined 
as  r(x)  =  codim(x)  (36:24). 

With  the  Mobius  function  and  the  rank  function,  the  characteristic  polynomial, 
TT,  can  be  defined.  Let  A  be  an  arrangement,  and  let  L{A)  be  the  cut-intersection 
semi-lattice  defined  on  A.  Define  p{x)  =  p{V,x),  where  V  represents  the  entire  space 
and,  hence,  is  the  greatest  lower  bound  of  L{A)  since  L{A)  is  ordered  by  reverse 
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inclusion.  Then,  the  characteristic  polynomial  of  A  is  defined  in  (36)  as 

^(A,t)=  (13) 

x^L(A) 

Note  that  for  the  special  empty  arrangement,  A  =  $,  then,  by  definition, 
7r(A,t)  =  1.  Zaslavsky  also  showed  that  given  an  arrangement  A  and  a  6  A,  then 
the  following  recursion  formula  for  the  characteristic  polynomial  holds: 

7r(A,  t)  =  7r(A',  t)  +  tTr{A",  t). 

Then,  since  card(C'(A))  and  7r(A,  1)  have  the  same  value  for  A  =  0  and  satisfy  the 
same  recursion  for  deletion  and  restriction,  it  is  true,  by  uniqueness,  that 

card(C(A))  =  7r(A,  1). 

This  is  an  important  result  that  led  to  more  analytical  approaches  for  counting 
chambers  and  other  geometric  entities.  (50)  (36) 

5-4  The  Lattice  Structure  and  Combinatorial  Geometry  of  Artificial  Neural  Net¬ 
works 

Sections  5. 1-5.3  defined  a  lattice,  the  specifics  of  the  cut-intersection  semi¬ 
lattice  defined  on  the  set  of  intersections  of  hyperplanes,  and  the  analysis  that  can 
be  accomplished  through  the  Poincare  Polynomial.  In  parallel,  this  section  provides 
the  details  of  the  same  structures  defined  on  the  set  of  sets  derived  from  a  feed¬ 
forward,  single  hidden-layer,  perceptron  artificial  neural  network. 

The  reason  for  establishing  a  lattice  structure  on  which  to  perform  ANN  capa¬ 
bility  analysis  is  two-fold.  The  first  is  the  general  notion  that,  ultimately,  an  ordering 
on  ANN  architectures  is  sought  which  is  based  on  capability  defined  on  invariants. 
WeU-behaved  orderings  are  one  of  the  outcomes  of  lattice  theory  along  with  provid- 
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ing  a  basis  from  which  invariants  can  be  extracted.  Obtaining  those  invariants  is 
the  second  reason  a  lattice  is  sought  since  invariants  will  be  used  to  characterize  the 
complexity  of  signed  sets. 

5.4-1  Cut- Intersection  Semi-Lattice  of  ANNs.  Given  a  fixed  feed¬ 

forward,  single  hidden-layer,  perceptron  ANN,  there  is  an  implied  arrangement  of 
hyperplanes  which  directly  establishes  a  finite  set  of  half-spaces.  The  intersection  of 
the  half-spaces  is  accomplished  through  the  logic  layer  of  the  network.  The  result 
is  a  set  of  sets.  Therefore,  with  set  containment  providing  an  ordering  (by  reverse 
inclusion),  intersection  as  the  meet  operation  and  union  as  the  join,  this  is  a  specific 
example  of  the  cut-intersection  semi-lattice  defined  in  Section  5.2. 

First,  notation  will  be  established.  Consider  the  fixed  ANN,  (a;,-,  Vi,  Ti,  k  fixed), 

k 

.  X-Ti),  (14) 

1  =  1 

where  x  €  is  the  input  vector,  W  is  the  Heaviside  function,  Vi  €  R*^  for  i  =  1, . . . ,  A: 
are  the  weight  vectors,  Tj  €  R  for  i  =  1,2,  ...,A:  are  the  threshold  values  at  each 
of  the  k  nodes  and  w,  €  {0, 1}  for  i  =  1, . . . ,  A:  are  known  values.  This  defines 
an  instantiation  of  an  architecture,  which  corresponds  to  a  finite  arrangement  of 
hyperplanes,  A  =  {fi,  I2,  I3, ...,  4}j  where 

/,•  =  {ar  €  R*^  :  T,(x)  =  Uj  •  x— r,-  =  0} 

and  Li  defines  an  affine  mapping  R*^  — ^  R.  That  is,  each  Li  corresponds  to  a  (d  —  1)- 
dimensional  hyperplane,  /i  =  {x  €  R*^  :  Li{x)  =  0}.  Assume  that  each  of  the  k 
hyperplanes  are  distinct.  Since  each  hyperplane  naturally  divides  R*^  in  half,  define 
the  corresponding  set  of  half-spaces  H  =  {hi,  /12,  hs, ...,  hk}  as 

hi  =  {x  I  Uj  •  x—Ti  >  0}  C  R*^. 
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Note  that  by  switching  the  sign  of  u,-  and  r,,  h,  represents  the  “other  half”  of  the 
space  or  the  interior  of  the  complement  of  the  half-space.  Define 

hi  —  {x  \  Vi  •  x—Ti  <  0}  C  R'*. 

Note  that  this  is  the  interior  of  the  complement  of  hi,  not  exactly  the  complement. 

Additionally,  consider  the  set  of  all  possible  arrangements  of  hyperplanes  in 
Define 

A‘^  =  {A  =  ^  e  N, /,•  =  {x  :  Z,,(a:)  =  0}  for  each  i  =  l,2,...,fc}. 

Note  that  by  definition,  any  A  E  can  be  defined  by  equation  14.  Also,  define  the 
corresponding  set  of  sets  of  half-spaces, 

=  {ff  =  {hi,h2,h3,...,hk}:kEN}. 

Note  that,  and  can  be  partitioned  dependent  on  the  type  of  architectures 
that  are  being  investigated  for  capability  analysis.  For  example,  a  partition  based 
on  the  number  of  hyperplanes,  k,  in  an  arrangement  gives  a  simple  organization  of 
all  architectures.  Specifically,  for  each  k  €  N,  let 

h,  h,  —5  4}  G  A*^}. 

Note  that  can  be  thought  of  as  the  set  of  k  hyperplane  arrangements  gener¬ 
ated  from  a  feed-forward,  single  hidden-layer,  perceptron  ANN  with  k  nodes  in  the 
hidden-layer  and  d-dimensional  input  since  each  node  corresponds  to  a  hyperplane. 
Moreover, 

A‘‘=  U  Ti. 

ksN 
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Hence,  A‘^  is  the  set  of  all  feed-forward,  single  hidden-layer,  perceptron  ANN  with 
an  arbitrary  set  of  nodes  in  the  hidden-layer  and  d-dimensional  input. 

Now,  consider  the  finite  set  of  all  intersections  of  the  hyperplanes  in  an  ar¬ 
rangement,  A 

Ca  =  \  f]  I  :  f]l  for  a\\T  C  A 

l./eT  leT 

Define  the  partial  ordering  on  Ca,  as  reverse  set  inclusion.  In  other  words,  let  x, 
y  €  Ca,  then  x  :<y,if  and  only  if  y  C  x. 

Theorem  12  Given  an  arrangement,  A  €  A'^,  corresponding  cut-intersection  set, 
Ca,  and  ordering,  ■<,  then  (Ca,^^)  is  a  partially  ordered  set. 

Proof.  Let  x,  y,  z  €  Ca-  Clearly,  for  all  x  €  Ca,  x  :<  x  since 
X  C  X.  Hence,  ■;<  is  reflexive.  Assume  x  :<y  and  y  :<  x.  This  implies  that  x  Cy  and 
?/  C  X.  Hence,  x  =  y.  Therefore,  ^  is  antisymmetric.  Assume  x  :<y  and  y  z.  This 
implies  that  x  Cy  and  y  C  z,  which  implies  x  C  z.  Hence,  x  d  z  implying  that  d  is 
transitive. 

Therefore,  combining  implies  {CA,d)  is  a  partially  ordered  set.  □ 

Now,  define  the  join  as 


X  V  j/  =  lub{x,  y}  =  X  n  y 


and  the  meet  as 

X  A  y  =  glb{x,  y}  =  n{/  €  A  ;  /  3  x  U  y). 

Theorem  13  Given  an  arrangement,  A  €  the  set  Ca,  the  ordering  d,  and  the 
operations  V,  and  A,  then  V,  A)  is  a  cut-intersection  semi-lattice. 

Proof.  {Ca,  i^,  V,  a)  is  a  specific  example  of  the  cut-intersection 
semi-lattice  defined  in  general  in  Section  4.2.  □ 
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From  here,  refer  to  (Ca,  V,  A)  as  the  cut-intersection  semi-lattice  of  a  feed¬ 
forward,  single  hidden-layer,  perceptron  artificial  neural  network. 

5.4-S  The  Characteristic  Polynomial  of  With  the  cut-intersection 

semi-lattice,  {Ca,  di,  V,  A),  defined,  the  combinatorial  geometry  of  the  hyperplane  ar¬ 
rangement  can  be  investigated.  In  particular,  the  characteristic  polynomial  can  be 
defined. 

Consider  the  Mobius  function  defined  as 

1  if  X  =  y  and  x  ^Ca 

t^{x,z)  if  x,y,z  ^Ca  X  riy  (15) 

3:-<z<y 

0  else. 

Again,  for  convenience,  define  y.{x)  =  fj,{V,x),  where  V  is  the  greatest  lower  bound 
of  Ca  which  would  be  R‘^.  Let  r(x)  =  codim(x).  Then,  the  characteristic  polynomial 
of  any  A  G  can  be  defined  as 

n(A,t)=  Kx){-tY(‘\  (16) 

xeCA 

Now,  structure  is  in  place  to  analytically  and  deterministically  evaluate  certain  in¬ 
variants  about  an  ANN  architecture. 

5.4.3  The  Relationship  Between  the  Poincare  Polynomial  and  the  Vapnik- 
Chervonenkis  Dimension.  What  does  the  Poincare  polynomial  have  to  do  with 
artificial  neural  networks?  First,  the  fact  that  Zaslavsky  introduced  an  analytical 
method  for  determining  the  value  of  card(C'(A))  has  made  the  venture  into  the  study 
of  combinatorial  geometry  of  arrangements  of  hyperplanes  worthwhile.  Chamber 
cardinality,  through  Zaslavsky’s  work,  is  now  an  accessible  geometric  invariant.  In 
fact,  it  is  now  pertinent  to  consider  the  relationship  of  combinatorial  geometry  and 
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the  arrangement  of  hyperplanes  that  result  from  the  ANN  architecture  discussed  in 
Chapter  I. 

By  Zaslavsky’s  result  described  above,  for  each  A  €  there  is  a  lattice  L{A) 
and  corresponding  Mobius  and  rank  function  to  define  the  characteristic  polynomial, 
•K.  Moreover,  for  each  A  €  card(C'(A.))  =  7r(A,  1). 

Let  A*  6  be  an  arrangement  of  hyperplanes  such  that  the  maximum  number 
of  chambers  is  achieved.  This  happens  when  the  arrangement  is  in  general  position 
and  card(j4*)  =  k.  (Recall  that  an  arrangement  is  in  general  position  if,  when  any 
two  planes  have  a  common  line,  the  line  is  distinct;  and  when  any  three  planes  have 
a  common  point,  the  point  is  distinct.)  More  formally, 

card(C'(A.*))  =  sup  |card(C(A.))  :  A  €  . 

Define  L{A*)  as  before  with  the  reverse  set  inclusion  ordering.  Note  that  since  A* 
is  just  an  instance  of  an  axbitrary  arrangement  of  hyperplanes,  then  L(A*)  is  a  cut- 
intersection  semi-lattice.  Therefore,  it  can  substantiate  the  Mobius  function,  fx,  the 
rank  function,  r,  and  the  characteristic  polynomial,  tt. 

Theorem  14  Let  A*  6  be  an  arrangement  of  hyperplanes  defined  by  a  trained 
ANN  architecture  such  that  card(C'(A*))  =  sup  |card(C'(A))  :  A  G  =  k.  As¬ 
sume  A*  is  in  general  position.  Then, 

VC{Ti)  =  c^vdiCiA*)). 

Proof.  Let  n  =  card(C'(A*)).  Choose  a  set,  X  €  R‘^,  such  that 
card(A')  =  n  and  each  x  E  X  lies  in  a  different  chamber  of  A*.  Note,  since  the  points 
are  all  separated  from  each  other  in  the  chambers,  then  the  hyperplanes  of  A*  that 
form  the  chambers  can  implement  any  dichotomy  of  X.  This  demonstrates  a  set  of 
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n  points  which  can  be  shattered  by  the  arrangement,  A*,  and  since  A*  €  then 
VC{Ti)  >  n. 

In  order  to  show  that  VC{T^)  is  no  bigger  than  n,  choose  any  signed  set  of 
points  Y  =  (F+jF")  such  that  card(F)  >  n.  Since  the  chambers  of  A*  form  a 
partition  of  then  the  maximum  number  of  chambers  possible  is  n  (since  A*  is  in 
general  position  and  card(A*)  =  k).  There  are  more  than  n  points,  and  regardless 
of  the  arrangement  of  points,  at  least  one  chamber  will  have  more  than  one  point  in 
it.  Therefore,  for  each  set  Y  of  greater  than  n  points,  one  of  the  dichotomies  will 
result  in  a  y'*’  G  Y'^  and  a  y”  G  Y~  existing  in  the  same  chamber.  In  other  words, 
A*  fails  to  shatter  any  set  of  n  +  1  points.  So,  VC{^^)  <  n  +  1. 

Combining  yields  VC{^^)  =  n  =  card(C'(A*)).  □ 

While  it  is  true  that  a  fixed  ANN  may  define  a  set  of  hyperplanes  that  are 
not  in  general  position  (it  is  possible  that  there  will  be  redundant  processors  or 
processors  that  yield  parallel  hyperplanes),  the  more  meaningful  situation  is  one 
where  an  ANN  architecture  is  pushed  to  its  capacity  resulting  in  an  arrangement 
of  hyperplanes  with  maximal  chambers.  Hence,  Theorem  14  describes  a  technique 
for  evaluating  the  V-C  dimension  for  a  set  of  ANN  architectures  based  purely  on 
geometric  properties  of  the  hyperplanes.  In  the  case  where  an  arrangement  is  not  in 
general  position,  the  capacity  of  that  ANN  would  be  less  than  the  V-C  dimension  of 

•'  K- 

5. 4-4  Duality  of  Points  and  Hyperplanes.  The  reformulation  of  the  V- 
C  dimension  falls  out  cleanly  because  a  relationship  exists  naturally  between  the 
arrangement  of  the  hyperplanes  and  the  arbitrary  (an  unsigned)  set  of  points  in 
For  V-C  dimension,  the  fact  that  only  one  set  of  points  that  is  shatterable  must  exist 
in  order  to  value  the  V-C  dimension  as  that  set’s  cardinality  also  contributes  to  the 
unique  relationship.  The  proof  essentially  equates  a  chamber  with  a  point,  allowing 
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for  the  tie  between  cardinality  of  a  set  of  chambers  and  the  cardinality  of  the  set  of 
points. 

It  is  not  always  the  case  that  there  exists  a  relationship  between  an  invariant 
of  the  arrangement  and  the  signed  set.  Other  examples  require  a  little  finesse  by 
appealing  to  the  duahty  of  hyperplanes  and  points.  Zaslavsky  uses  this  duahty 
often  to  achieve  many  of  the  results  of  his  counting  arguments  (50:4).  Hence,  where 
invariant  analysis,  thus  far,  has  been  about  the  structure  of  the  arrangement  of 
hyperplanes,  by  duality,  it  can  also  be  about  a  set  of  points.  This  is  accomplished 
through  a  mapping  that  equates  hyperplanes  in  a  d  —  1  dimensional  subspace  to 
points  in  a  d  dimensional  vector  space.  This  is  an  important  tie  because  the  new 
research  presented  in  Chapter  VI  and  Chapter  VII  is  based  on  invariant  analysis  of 
signed  sets  of  points. 

5.4-5  The  Lattice  of  the  ANN  Chamber  Set.  Recall  that  Theorem  14 
establishes  a  relationship  between  the  chambers  and  ANN  capability  analysis  based 
on  V-C  dimension.  Note  that  while  a  set  of  chambers  is  referred  to  as  an  invariant 
in  the  literature,  it  also  possesses  a  lattice  structure.  The  set  of  chambers  produced 
by  an  ANN  can  be  defined  in  this  context. 

Let  H*{A)  =  {^1,  h2,  hz, ...,  hk,  h{,  h^,  h\, ...,  where  h^  denotes  the  interior 
of  the  complement  of  hi.  Then,  the  set  of  chambers  of  an  arrangement,  A,  is  defined 
by 

C{A)  =  I  f|  r  :  f|  ^  0  for  all  T  C  H*{A) 

Define  ■<  as  set  inclusion.  That  is,  for  a:,t/  €  C(A),  x  ■<  y  ii  and  only  iix  Cy.  Then, 
the  meet.  A,  is  defined  as 

X  Ay  =  glb{a;,y}  =  xf\y. 
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Figure  5.  Chambers  x  and  y. 

By  definition  of  C'(A),  the  meet  operation  is  a  closed  operation  on  C{A).  The  join, 
V,  is  defined  as 

xVy  =  lub{a:, y}  =  xU  y. 

Note  that  the  set  C{A)  is  not  closed  with  respect  to  the  join  operation  as  demon¬ 
strated  in  Figure  5.  In  the  figure,  the  set  x  U  y  is  not  contained  in  C{A).  Note  that 
the  chamber  x  =  h\  C\  h2  C\  and  the  chamber  ?/  =  fl  /ij  H  ^3. 

Theorem  15  Let  A  be  an  arrangement  of  hyperplanes  generated  by  an  ANN.  Let 
■<  be  set  inclusion  and  A  and  V  as  defined  above.  Then,  (C{A),-<,  A,\/)  is  a  meet 
semi-lattice. 


Proof.  First,  show  (C'(A),:^)  is  a  partially  ordered  set.  Let 
x,y,z  ^  C{A).  Since  x  Cx,  then  x  :<x.  Hence,  ■<  is  reflexive. 

Assume  x  :<  y  and  y  d:  x.  This  implies  x  Cy  and  y  C  x,  which  implies  x  —  y. 
Hence,  ::<  is  antisymmetric. 

Assume  x  y  and  y  z.  This  implies  x  Q  y  and  y  Q  z,  which  implies  x  Q  z. 
Hence  x  ■<  z,  i.e.  ^  is  transitive. 
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Since  is  reflexive,  antisymmetric,  and  transitive  for  all  x,y,z  G  C{A),  then 
(C'(A),:^)  is  a  partially  ordered  set.  Moreover,  C{A)  is,  closed  with  respect  to  the 
meet  operation.  A,  which  implies  (C(A),  A,  V)  is  a  meet  semi-lattice.  □ 

Note  that  (C'(A),  A,  V)  is  not  a  lattice  since  the  join  is  not  closed.  However, 
it  is  possible  to  produce  a  lattice  on  the  chambers  produced  by  the  hyperplane 
arrangement  of  an  ANN.  Consider  an  architecture  with  an  additional  hidden- layer 
such  that  the  interconnection  weights  are  binary.  This  produces  an  additional  logic 
operation.  With  the  additional  hidden-layer,  the  set  of  sets  represented  includes 
C{A)  and  the  union  of  all  the  elements  of  C{A)  which,  by  definition,  is  the  power 
set  of  C{A),  denoted  V{C{A)). 

Theorem  16  Given  an  arrangement  A  G  corresponding  set,  C{A),  ordering, 
X,  and  meet  and  join  operations  as  defined  above,  ('P((7(A)),  A,  V)  is  a  lattice. 

Proof.  First,  show  {V{C{A)),:<)  is  a  partially  ordered  set. 

Let  x,y,z  €  P(C{A)).  Since  x  C  x,  then  x  ■<  x.  Hence,  ■<  is  reflexive.  Assume 
X  y  and  y  x.  This  implies  x  C  y  and  y  C  x,  which  implies  x  —  y.  Hence,  ■< 
is  antisymmetric.  Assume  x  y  and  y  z.  This  implies  x  C  y  and  y  Q  z,  which 
implies  x  z.  Hence  x  :<  z,  i.e.  is  transitive. 

Since  ■<  is  reflexive,  antisymmetric,  and  transitive  for  all  x,y,z  E  P(C(A)), 
then  (V(C(A)),^)  is  a  partially  ordered  set. 

Moreover,  by  definition,  P(C(A))  is,  closed  with  respect  to  the  meet  and  the 
join  operations,  which  implies  (P(C(A)),  A,  V)  is  a  lattice.  □ 

5.5  Conclusions 

In  summary,  this  chapter  described  the  required  concepts  of  lattice  theory  and 
combinatorial  geometry  that  facilitated  an  investigation  of  the  lattice  structures  of 
feed-forward,  single  hidden-layer,  perceptron  artificial  neural  networks.  This  carved 
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the  way  for  establishing  that  V-C  dimension  can  be  equated  with  the  invariant  cham¬ 
ber  cardinality.  Additionally,  the  cut-intersection  semi-lattice  of  feed-forward,  single 
hidden-layer,  perceptron  artificial  neural  networks  was  defined  which  facihtated  the 
definition  of  the  characteristic  polynomial.  Moreover,  the  set  of  chambers  produced 
by  an  ANN  was  also  found  to  be  a  semi-lattice. 
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VI.  Generalized  Invariant  Analysis  Applied  to  Artificial  Neural 

Network  Capability  Analysis 

This  chapter  pulls  together  the  concepts  of  lattice  theory  and  combinatorial 
geometry  outlined  in  Chapter  V  and  the  desired  properties  of  a  quantifier  of  ANN 
capabilities  outlined  in  Chapter  IV.  A  framework  will  be  built  that  generalizes  meth¬ 
ods  used  to  determine  ANN  capabilities.  The  purpose  of  deriving  the  generalized 
framework  is  to  build  a  method  for  comparing  alternative  architectures  for  their  par¬ 
simonious  solutions  to  the  classification  problem.  In  other  words,  a  partial  ordering 
of  sets  of  sets  derived  from  different  architectures  is  sought.  The  ordering,  to  be 
presented,  is  determined  by  the  complexity  of  signed  sets  for  which  an  architecture 
can  provide  a  solution.  Hence,  the  crux  of  ANN  capability  analysis  is  reduced  to 
characterizing  this  notion  of  complexity  about  signed  sets.  This  is  achieved  through 
invariant  analysis. 

This  view  of  capabilities  analysis  is  consistent  with  V-C  dimension.  In  other 
words,  V-C  dimension  analysis  can  be  posed  as  a  specific  instantiation  of  the  general¬ 
ized  framework  where  cardinality  is  the  invariant  and  V-C  dimension  is  the  function. 
However,  the  premise  of  this  work  is  that  V-C  dimension  has  significant  faults  and, 
with  the  proper  mathematics,  more  appropriate  quantifiers  can  be  designed.  Specif¬ 
ically,  the  Ox- Cart  dimension  will  be  defined  in  Chapter  VII  which  is  based  on  an 
invariant  called  the  geometric  complexity. 

Before  any  of  the  ANN  capabilities  analysis  can  be  attempted,  it  is  necessary 
to  have  a  clear  and  concise  definition  of  the  problem  space.  Therefore,  the  definition 
and  properties  of  the  set  of  signed  sets  is  investigated  thoroughly. 

6.1  Generalizing  the  Problem  Space 

In  this  section,  the  mathematical  structure  of  the  set  of  signed  sets  will  be 
presented.  In  addition,  the  desired  invariant  properties  about  signed  sets  will  be 


64 


defined.  Moreover,  these  properties  will  be  used  to  show  the  invariant  nature  of  the 
mapping,  geometric  complexity. 

6.1.1  The  Collection  of  Signed  Sets.  The  domain  of  the  mappings  that 
will  define  the  set  of  invariants  in  the  generalized  framework  is  the  set  of  signed  sets. 
Recall  that  an  ordered  pair,  {x,y),  is  a  set  of  sets  {{a;},  {a:,y}}  and  {x,y)  ^  {y,x) 
unless  X  =  y.  Now,  consider  the  following  definition  of  a  signed  set. 

Definition.  A  signed  set,  X^,  on  is  an  ordered  pair  of  sets,  (Jf"*",  X“)  €  'P(R‘^)  x 
7^(R‘^)  such  that  X"*"  fl  X~  =  0.  The  corresponding  unsigned  set  of  X®  is  X  = 
X+UJA-. 

Note  that  since  X^  =  is  an  ordered  pair  and  X'^  fl  X~  =  0, 

(X"'', X")  ^  {X~,X'^)  unless  X^  is  the  signed  empty  set  fwhich  is  just  the  empty 
set  and  will  be  denoted  0®).  Hence,  consider  the  rigorous  definition  of  an  ordered 
pair  to  establish  signed  set  equality,  denoted  =.  The  ordered  pair  {X^,X~)  can  be 
uniquely  defined  by  { {X+  } ,  { A’"'" ,  } } . 

Definition.  Two  signed  sets,  XI  and  X|,  are  said  to  be  equal,  written,  Xf  =  X|  if 
and  only  if  Xf^  =  Xf  and  Xf  =  . 

Let  X  denote  the  set  of  all  signed  sets  on  R*^,  i.e.,  X  =  {X®  :  X®  is  a  signed 
set}.  Define  the  zero  signed  element  in  X  as  0®  =  (0,0). 

Consider  the  following  definition  of  scalar  multiplication  on  X. 

Definition.  Signed  set  scalar  multiplication  is  defined  as 

aX®  ^  (aX+,aX-) 

for  all  O'  €  R,  a  7^  0  and  XjjXl  €  X.  Recall  that  aX  =  {ax  :  x  G  X}  for  all 
X  C  R*^. 

Lemma  6  The  set  X  is  closed  with  respect  to  scalar  multiplication. 
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Proof.  Assume  not.  That  is,  without  loss  of  generality  assume 
aX^  n  aX~  ^  0  for  some  a  €  R,  a  7^  0  and  X^  ^  X.  This  implies  that  there 
exists  Xa  €  OiX'^  and  €  aX~ .  This  implies  there  exists  x  such  that  Xa  =  ax 
and,  moreover,  that  x  E  X"*"  and  x  E  X~.  This  implies  X'^  D  X~  7^  0  which  is  a 
contradiction.  Hence,  aX'*'  naX~  —  0.  Therefore,  X  is  closed  with  respect  to  scalar 
multiplication. 

6.1.2  Invariant  Properties  for  Operations  on  Signed  Sets.  The  generalized 
framework  for  determining  ANN  capabilities  will  be  centered  on  invariance.  The 
following  transformations,  defined  on  A',  will  help  formalize  mathematically  the  in¬ 
variance  desired  of  a  mapping.  Specifically,  it  is  desired  that  the  mappings  that 
characterize  signed  sets  will  be  invariant  to  dilation,  translation,  or  rotation  of  the 
signed  sets. 

Let  3^  be  the  collection  of  finite  sets  Y  such  that  Y  C  R*^  that  is 
3^  =  {y  C  R‘^  :  card(y)  is  finite}. 

Definition.  Let  7  E  R'*".  Define  Dy  :  'P(R‘^)  —>■  'P(R‘^)  as 

D.,(Y)  =  {ts  e  R'  : !( €  r} 

for  all  Y  E  7^(R‘^).  Then,  the  dilation  operator,  D.y  :  X  X,  is  defined  as 

D4X‘)  = 

for  all  X’  €  X. 

Definition.  Let  xq  E  R*^.  Define  T^g  :  P(R‘^)  — >  'P(R‘^)  as 

no(Y)  =  {(xo  +  g)ER‘^:g€Y}. 
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for  all  V  G  ■P(R‘^).  Then,  the  translation  operator,  :  A!  X,  is  defined  as 


for  all  X‘‘  eX. 

Definition.  Let  A  G  Define  W\  :  'P(R‘^)  — >  V(R'^)  as 

Wx{Y)  =  {rx{y)eR‘^:yeY}, 

for  all  Y  G  V(R'^)  where  r\  :  R'*  R'^  is  a  vector  rotation  operator  that  can  be  rep¬ 

resented  by  an  orthogonal  matrix  multiplication  with  the  angle  A  =  (Aj,  A2,  ...Ad_i). 
Then,  the  rotation  operator,  Wa  :  A’  — Af,  is  defined  as 

w,(A:*)  =  (trA(A-+),iVA(A-)), 


for  all  X‘  €  A'. 

Note  that  both  D-y  and  Wa  are  linear.  However,  is  affine. 

Convex  hulls  of  sets  will  also  be  required. 

Definition.  Let  ai,  a2,...,Q:n  G  R'*’?  such  that 

n 

=  1- 

t  =  l 

The  convex  hull  of  a  set  F  G  F?  denoted  co{Y),  is  the  set 

{n  n 

G  R'^  :  {yi,t/2,  -J/n}  C  F,  n  G  N,  =  1,  a,  >  0 

i=i  t=i 

Let  X'f  denote  the  set  of  finite  signed  sets  in  X. 

Definition.  Define  the  convex  hull  of  a  signed  set  G  X^  to  be 

co{X^)  =  {co{X+),co{X-)). 
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It  will  be  important  for  each  of  the  desired  invariant  properties  defined  as  transfor¬ 
mations  above  to  preserve  convexity.  Hence,  consider  the  following  lemmas. 


Lemma  7  Let  7  €  R”^.  Then,  for  each  G  ,  D-^(co(X®))  =  co(D.y(Jf^)). 

Proof.  Let  7  G  R"'',  Y  €  y  with  card{Y)  =  n.  First,  show 
D.y{Y)  preserves  convexity,  i.e.  D.y{co{Y))  =  co{Dy{Y)). 

Dy{co{Y))  =  Dyii'fZU  cciVi  :yieY  Er=i  =  !,««>  0}) 

=  {7  E"=i  <^iyi  -yi^Y  EiLi  «.■  =  «*•  >  o}) 

=  {EiLi  ociiyi  '■  yi  ^  Y  Er=i  «.•  =  i,  >  o}} 

=  co{Dy{Y)). 

Therefore,  for  each  G 

By{co{X^))  ^  Dy{co{X+),co{X-)) 

^  {Dy{co{X+)),Dy{co{X-))) 

^  {co{Dy{X+),co{Dy{X-)) 

^  co(d^(a:^)).  □ 

Lemma  8  Let  xq  €  R*^.  Then,  for  each  X^  G  X^,  Txo{co{X‘))  =  co(Txg(X^)). 

Proof.  Let  xq  G  R'^,  Y  ^  y  with  cardfY)  =  n.  First,  show 
TxofY)  preserves  convexity,  i.e.  Txoico{Y))  =  co(Txo{Y)).  Consider 

Tx,ico{Y))  =  r,„({Er=i  c^iyi  -yieY  eLi  =  i,  «.•  >  o}) 

=  {ajo  +  E"=i  ociyi  -  yi  ^Y  E"=i  <»»  =  i,  >  o}} 

=  {E"=i  oci^o  +  Er=i  -yi^Y  Er=i  o:*-  =  1,  «.  ^0}} 

=  {E”=1  «t(a:o  +  yi)  -yi^Y  Er=i  Oi  =  L  >  0}} 

=  co(T.„(F)). 
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Therefore,  for  each  X®  € 


T,„(co(JSf®))  ^  T,„(co(X+),co(X-)) 

^  (r,„Mx+)),r.„Mx-))) 

^  {coiT.,{X+),co{T,,{X-)) 

^  co(T,„(X®)).  □ 

Lemma  9  Let  A  6  Then,  for  each  X^  G  X,  W x(co[X^))  =  co(W\{X^)). 

Proof.  Let  A  €  Y  ey  with  card{Y)  =  n.  First,  show 

VF\(y)  preserves  convexity,  i.e.  VFa(co(T))  =  co{Wx{Y)).  Consider 

Wx{co{Y))  =  fFA({Er=i  ociVi  :  j/i  €  F  E?=i  =  1,  >  0}) 

=  {»■>  (EF=i  oavi)  -Vi^Y  E"=i  «»•  =  1,  oLi  >0}} 

=  Er=i  oarxivi)  -yi^Y  ELi  ^  0}} 

=  co{Wx{Y)). 

Therefore,  for  each  X^  G 

Wa(co(X®))  ^  Wa(co(X+),co(X-)) 

^  (fFA(co(X+)),fFA(co(X-))) 

^  {co{Wx{X^)MWx{X-)) 

^  co(Wa(X®)).  □ 

This  section  has  established  a  basic  knowledge  of  signed  sets  which  represent 
the  classification  problem. 

6.2  Generalizing  Artificial  Neural  Network  Capability  Quantifiers 

Measurements  of  capability  about  ANNs  can  be  viewed  as  functions  of  invari¬ 
ants  about  an  arrangement  A.  The  point  is  to  broaden  the  understanding  of  how 
to  characterize  the  strength  of  an  arrangement  which  corresponds  to  an  ANN  ar- 
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chitecture.  The  goal  is  to  identify  mathematical  entities  that  vary  as  the  systems 
complexity  varies  and  are  invariant  to  characterizations  of  a  system  which  do  not 
contribute  to  an  accurate  description  of  an  ANN’s  ability  to  solve  classification  prob¬ 
lems. 

Continuously  throughout  the  literature  (and  in  this  dissertation),  reference  is 
made  to  the  goal  of  measuring  ANN  capabilities.  Also,  V-C  based  functions  are 
sometimes  referred  to  as  measures.  Each  of  these  terms  is  used  loosely.  In  fact, 
V-C  based  quantifiers  are  not  measures  in  the  mathematical  sense.  Moreover,  the 
quantifiers  that  will  be  defined  by  the  general  framework  define  a  function  of  some 
invariant.  As  it  turns  out,  the  invariants  are  semi-measures.  Consider  the  definition 
of  a  measure  and  a  semi-measure. 

Definition.  Let  T  be  a  collection  of  sets  Y  such  that  Y  C  A  mapping,  m, 
defined  on  a  set  F  £  T  such  that  m{Y)  €  R  is  a  measure  if  it  has  the  following 
properties: 

1.  m(0)  =  0  (where  0  is  the  empty  set). 

2.  m{Y)  >  0  for  all  nonempty  Y  £T. 

3.  For  any  finite  set  of  finite  disjoint  sets,  {Yi,Y2,...Yn}  C  T , 

Definition.  A  semi-measure  is  a  mapping  m  defined  on  a  set  Y  such  that  m(Y)  £  R 
and  has  Properties  1  and  2  defined  in  above  definition. 

V-C  dimension  is  not  a  measure  since  it  fails  to  satisfy  Property  3.  In  fact, 
this  is  a  property  that  is  undesirable  for  ANN  capability  quantifiers.  As  an  example, 
consider  two  identical  ANNs,  each  with  only  one  perceptron  that  corresponds  to  the 
x-axis  in  R^.  The  addition  of  the  two  nets  produces  the  same  set  of  hyperplanes. 
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namely,  the  x-axis  and  the  V-C  dimension  is  the  same.  Hence  the  V-C  dimension  is 
not  additive. 

6.2.1  The  Set  of  Invariants.  In  this  section,  the  notion  of  invariants  will 
be  formalized.  A  set  of  invariants  will  be  defined.  Consider  the  following  definition 
of  an  invariant.  (Note  that  this  definition  is  specific  to  this  dissertation.) 

Let  .E  =  {/  :  <S  ^  R  I  /(0)  =  0}  where  is  a  nonempty  set.  Then  S  is  a  linear 
space  over  R  with  point-wise  addition  and  scalar  multiplication.  Let  I  =  {I  £  Z  \ 
1(3)  >  0  for  all  5  €  5}. 

Lemma  10  X  is  a  convex  cone  in  Z. 

Proof.  Convexity:  Let  a  €  [0,1]  and  €  X,  then  by 

linearity  of  Z, 


[ah  +  (1  -  oc)h]  (S)  =  ahiS)  +  {1-  a)hiS)  >  0 

for  all  5  €  «S.  Therefore  ah  +  (1  —  a)l2  £  X. 

Cone:  Let  a  >  0,  7  €  T,  then  {al\  (S)  =  aI(S)  >  0.  So  al  £  X.  Hence,  X  is  a 
convex  cone  in  Z.  D 

Definition.  An  element  7  G  X,  is  said  to  be  an  invariant  with  respect  to  a  family, 
Ai,  of  mappings  M,  M  :  X  — >  X  if  M{I)  =  7  for  all  M  G  A4.  That  is 

[M(7)]  (5)  =  I(S)  for  all  S'  €  .S. 

Consider  the  following  example.  Recall  the  definition  of  the  transformation 
operator 

Example.  Let  the  invariant,  7  G  X,  be  defined  as,  7  =  tti,  where  tti  =  ir(L(A),  1), 
the  characteristic  polynomial  evaluated  at  t  =  1  defined  on  a  cut-intersection  semi- 
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lattice  of  an  arrangement  of  hyperplanes,  L(A).  Hence,  5  =  L{A).  Define  M  €.  M 
as 

M(7ri)  =  TTi  orxo. 

Define  fxoiHA))  =  {T^oia)  '■  a  €  L{A)}.  The  claim  is  that  for  some  fixed  xo  € 
M(7r)  =  TT  for  all  L{A).  Given  L{A),  consider 

[M(7rx)](T(A))  =  [xi  0  IL„](T(^)) 

=  7ri(f,„(T(A))) 

=  7r(f!,„(T(A),l) 

=  card(C'(f,,(T(A)))) 

=  card(C'(L(A))) 

= 

Hence,  M{'K\)  =  tti,  for  all  T(>1).  Therefore,  xi  is  an  invariant  with  respect  to  Tx^. 
Note  that  this  invariant  maps  arrangements  to  real  numbers,  i.e.  tti  :  L{A^  R. 

Consider  another  example  of  an  invariant  which  does  not  rely  on  the  existence 
of  a  cut-intersection  semi-lattice  or  any  other  lattice.  Instead  it  maps  sets  to  real 
numbers. 

Definition.  Given  M  :  S  S,  define  :  J  — >•  J  as  the  transpose  of  M  written 

M^{I)  =  /  o  iVf  for  all  /  G  I- 

Example.  Let  «S  =  T,  a  collection  of  finite  subsets  of  R*^.  Let  the  invariant,  I 
on  an  unsigned  set,  X  G  T  be  defined  as,  I  =  card(*).  Let 

M  =  {Wf  :  A  G  U  :  7  G  R+}  U  {T^  :  xo  €  R*^}. 
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The  claim  is  that  M^(card(-))  =  card(-)  for  all  M'^  G  M.  Consider  M  = 

[M^(card(-))](X)  =  [card(-)or,J(X) 

=  card(T,„(X)) 

=  card(A'). 

Hence,  r^(card(-))  =  card(-)  for  all  xo  €  Similarly,  VFf(card(-))  =  card(-)  for 
all  A  G  and  for  all  7  G  R"*",  D^(card(-))  =  card(-).  Additionally,  note  that 

card(-)  is  a  semi-measure.  Therefore,  cardinality  is  an  invariant  with  respect  to  M. 

Because,  an  ANN  capability  quantifier  should  be  about  signed  sets,  what  is 
sought  is  a  set  of  invariants  and  a  family  of  mappings  that  are  defined  on  signed 
sets.  Hence,  consider  the  following  definition  for  a  set  of  invariants  on  signed  sets, 
J®,  which  will  be  used  to  define  the  geometric  complexity  and  the  Ox-Cart  dimension 
in  Chapter  VII. 

Definition.  Let  S  =  A,  the  set  of  signed  sets  on  R'^.  Given  G  X,  define  the 
invariant  on  signed  sets,  I  G  X®,  as,  I  :  X  Z+.  Let  Af®  =  {Wf,D^,Tjp}  where 
A  G  7  G  R"^,  and  xq  G  R*^.  That  is,  for  G  Af® 

M^{I)  =  IoM 


for  all  X®  G  A". 

6.2.2  The  Generalized  Capability  Quantifier.  Now  that  the  generalized 
set  of  invariants  on  unsigned  sets,  X,  and  on  signed  sets,  X®,  has  been  defined,  the 
generalized  quantifier  of  ANN  capabilities  can  be  defined.  However,  first  recall  the 
definition  of  a  family,  of  ANNs 

ri  =  {A={h,h,h,-,QcA‘). 

where  A‘^  is  the  set  of  all  arrangements  in  d  dimensions. 
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Definition.  Given  c?  6  N.  Let  J  C  J  be  a  subset  of  invariants.  The  generalized 
quantifier  of  ANN  capabilities  with  respect  to  i/j-,  is  a  mapping 

uj  :  ^  Z. 

Definition.  Given  d  G  N.  Let  C  X®  be  a  subset  of  invariants  defined  on  signed 
sets.  The  generalized  quantifier  of  ANN  capabilities  with  respect  to  *7®,  Vj,  is  a 
mapping 

v-j  ■.  P(A‘*)  ^  Z. 

Note  that  it  also  makes  sense  to  require  that  v  and  i/®  be  semi-measures.  In 
fact,  since  the  invariants  are  semi-measures,  one  can  expect  that  if  is  empty,  then 
1/(7’^)  =  0.  Additionally,  it  will  be  required  that  >  0  for  all  C  for  any 

d.  (Where  it  is  clear  the  subscript  on  v  will  be  dropped.) 

Consider  the  reformulation  of  the  V-C  dimension  of  7"^.  It  has  already  been 
established  that  the  V-C  dimension  of  a  set  of  hyperplane  arrangements  is  the 
invariant  card(C(A)),  cardinality  of  the  number  of  chambers  of  the  arrangement 
A*  €  7"^,  which  can  be  analytically  determined  from  the  characteristic  polyno¬ 
mial  as  the  7r(A,  1).  In  the  context  of  the  generalization,  card(C'(A))  G  X  and 
=  max{card(C'(A))  :  A  G  7"^}  =  VC{J-^).  It  is  important  to  recall  at  this 
point  that  the  objective  of  this  dissertation  is  to  characterize  i/®,  since  it  will  be 
shown  that  the  analysis  of  capabilities  is  more  appropriately  defined  about  signed 
sets  instead  of  arbitrary  sets. 

6.2.3  The  Generalized  Partial  Ordering  and  Resulting  Lattice  of  ANNs. 
Given  the  generalized  capability  quantifiers,  u  and  i/®  define  the  ordering,  <,,,  and 
<1,3  on  A‘^  as  follows. 

Definition.  Given  Ai,  A2  €  A*^,  Ai  A2  if  and  only  if  j/®(Ai)  <  i/®(A2). 
(Similarly,  for  <1,  .) 
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Note  that  these  are  not,  in  general,  partial  orderings  since  they  are  not  anti¬ 
symmetric.  However,  it  is  possible  to  produce  a  partial  ordering  using  equivalence 
classes  of  hyperplane  arrangements. 

Definition.  Given  A  E  define  the  equivalence  class  of  A,  denoted  [A] ,  as 

\A]  =  {BiTi:  =  ./•(»)}. 

Definition.  Given  define  the  collection  of  equivalence  classes  of  denoted 

e(ri),  as 

=  {M 

Now  the  ordering,  <„»,  (and  <y)  can  be  defined  on  the  collection  of  equivalence 
classes,  £{^)- 

Definition.  Given  then  [A]  [5]  if  and  only  if  i/®([A])  <  i/®([S]).  (Simi- 

laxly,  for  <i,). 

Lemma  11  Given  d, /c  G  N,  <:/»)  is  a  partially  ordered  set. 

Proof.  Let  d,K  E  N.  Let  [A],  [B],  [C]  E  Note  that 

i/«([A])  =  i^®([A])  for  all  [A]  €  Hence,  [A]  <„»  [A] .  In  other  words,  <1,,  is 

reflexive. 

Assume  [A]  <u>  [B]  and  [B]  [A].  This  implies  J^'®([A])  <  *'*([5])  and 

*^*([.6])  <  i'*([A])  implying  that  z^®([A])  =  z/^([B]).  Hence,  [A]  =  [B].  In  other  words, 
<i,s  is  antisymmetric. 

Assume  [A]  <„3  [B]  and  [B]  <^s  \C].  This  implies  i/([A])  <  v{\B])  and 
<  i'®([C'])  implying  that  j/®([A])  <  i'*([C']).  Hence,  [A]  <v‘  \C].  In  other 
words,  <i,s  is  transitive. 

Since  <v‘  is  reflexive,  antisymmetric,  and  transitive,  then,  is  a 

partially  ordered  set.  □ 
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Note  that  <i/)  is  also  a  partially  ordered  set. 


Now,  enough  structure  has  been  put  into  place  to  define  the  lattice  of  feed¬ 
forward,  single  hidden-layer,  perceptron  artificial  neural  networks.  This  is  the  crux 
of  the  generalization  theory  that  allows  analysis  of  ANN  capabilities  based  on  invari¬ 
ants.  The  lattice  structure  provides  evidence  that  the  generalized  approach  provides 
a  well-structured,  well-behaved  environment  in  which  to  couch  capability  quantifiers. 
Consider  the  following  definitions  of  meet  and  join  on  Let  [A] ,  [B]  € 

Then, 


and 


[AlA|B|=glb{[A],lB]) 


(/ljV[Bl  =  lub{(/l],|B]) 


[A] 

if 

<  -•((Bl) 

1 

if 

•'•([B])  <  *'•(1/1)) 

1  M 

if 

i/‘([B])  <  i>‘([A]) 

\[B] 

if 

<  <^m) 

Theorem  17  For  each  /c,  d  €  N,  A,y)  is  a  lattice. 


Proof.  Note  that  {E{Pi),<us)  is  a  partially  ordered  set  by 

Lemma  11.  By  construction  of  A  and  V,  is  closed  with  respect  to  A  and  V. 

Hence  (5(.F^),  A,  V)  is  a  lattice  for  all  A:,  d  €  N.  □ 

Note  that,  again,  a  similar  lattice  can  be  defined  based  on  analysis  of  unsigned 
sets.  That  is  (5(/’^),  <^,  A,  V)  is  also  a  lattice.  However,  to  emphasize  the  point, 
the  lattice  based  on  signed  sets  is  the  one  of  interest  in  this  dissertation. 


6.3  Conclusions 

In  summary,  what  has  been  provided  is  a  method  for  characterizing  ANN 
capabilities  that  will  allow  comparisons  of  architectures  based  on  their  ability  to 
implement  the  dichotomies  of  signed  sets.  This  is  formalized  by  the  lattice  of  ANN’s 
based  on  a  generalized  capability  quantifier  and  partial  ordering.  To  facilitate  this, 
a  generalized  framework  has  been  presented  that  characterizes  the  invariance  of 
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signed  sets.  V-C  dimension  was  posed  as  an  instantiation  of  the  generalization,  and 
in  Chapter  VII  is  another  instantiation:  the  Ox-Cart  dimension. 
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VII.  The  Ox- Cart  Dimension  and  the  Lattice  of  Artificial  Neural 

Networks 

This  chapter  presents  a  means  of  determining  the  capability  of  feed-forward, 
single  hidden-layer,  perceptron,  artificial  neural  networks  to  solve  classification  prob¬ 
lems  based  on  the  complexity  of  a  signed  set,  X^.  By  example,  the  approach  presented 
here  demonstrates  the  usefulness  of  the  generalized  approach  defined  in  Chapter  VI. 
This  notion  of  complexity  is  encased  in  a  mapping  called  the  geometric  complexity, 
denoted  GC.  The  capability  quantifier  is  called  the  Ox-Cart  dimension,  denoted 
OC{A).  It  is  defined  on  an  arrangement.  A,  and  on  a  set  of  arrangements  of  hyper¬ 
planes  generated  from  an  ANN.  A  partial  ordering  of  ANNs  will  be  defined  using  the 
Ox-Cart  dimension.  The  ordering  provides  a  comparison  of  ANNs  based  on  their 
ability  to  solve  classification  problems  determined  by  the  complexity  of  the  problem. 
Moreover,  that  ordering  will  result  in  a  lattice  defined  on  ANNs. 

Since  the  ordering  is  determined  by  the  complexity  of  signed  sets  for  which  an 
architecture  can  provide  a  solution,  the  crux  of  ANN  capability  analysis  is  reduced  to 
characterizing  this  notion  of  complexity  about  signed  sets.  This  is  achieved  through 
invariant  analysis.  In  paxticular,  GC  will  be  posed  as  an  invariant  in  the  general¬ 
ized  framework  defined  in  Chapter  VI  and  OC  as  the  function  which  operates  on 
that  invariant.  Hence,  GC  is  a  mapping  on  signed  sets  and  OC  is  a  mapping  on 
arrangements. 

It  should  be  noted  that  defining  the  Ox- Cart  dimension  within  the  generalized 
framework  directs  the  capabilities  analysis  at  specific  geometric  characterizations  of 
classification  problems,  not  the  cardinality  of  arbitrary  classification  problems.  The 
specificity  is  determined  by  the  geometric  complexity  mapping  GC .  This  distinction 
is  the  major  difference  in  the  Ox-Cart  dimension  and  the  V-C  dimension. 

The  geometric  complexity  mapping  is  centered  around  the  notion  that  the 
iterative  intersections  of  the  signed  sets’  convex  hulls  are  an  appropriate  indicator  of 
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the  difficulty  of  the  classification  problem.  It  should  be  emphasized  that  the  Ox-Cart 
dimension  ultimately  appeals  to  the  analysis  of  a  particular  problem.  This  differs 
from  V-C  dimension  based  methods  which  are  based  on  arbitrary  arrangements  of 
sets  (not  signed  sets).  By  moving  the  analysis  to  signed  sets  from  unsigned  sets,  the 
methods  become  more  useful  in  applications,  since  they  provides  a  tool  for  directly 
analyzing  an  ANN’s  requirements  for  solving  a  classification  problem.  Moreover,  the 
sophistication  of  algorithms  for  deriving  the  geometric  invariants  used  to  determine 
the  Ox-Cart  dimension  maJses  evaluating  the  function  accessible.  (36) 

The  structure  of  this  chapter  parallels  that  of  Chapter  VI.  Where  Chapter  VI 
was  general,  this  chapter  will  provide  an  instantiation.  First,  there  wiU  be  a  discus¬ 
sion  of  geometric  complexity  as  an  invariant  and  its  properties  wiU  be  investigated. 
Then,  the  Ox- Cart  dimension  will  be  defined  as  a  particular  ANN  capability  quan¬ 
tifier,  i/.  A  partial  ordering  about  the  Ox- Cart  dimension  will  be  defined  and  the 
lattice  that  it  produces  wiU  be  established.  An  example  is  also  presented.  Finally,  a 
comparison  of  the  V-C  dimension  and  the  Ox-Cart  dimension  is  presented. 

7.i  The  Geometric  Complexity  Mapping 

This  section  will  provide  a  definition  of  geometric  complexity,  examples,  the 
discontinuity  of  GC,  and  its  invariant  properties. 

7.1.1  Definition  of  GC .  The  geometric  complexity  is  a  mapping  from  the 

set  of  finite  signed  sets,  ,  to  a  nonnegative  integer.  That  is, 

GC:X^  Z+. 

The  value  of  GC{X^)  is  indicative  of  how  “mixed  up”  the  dichotomy  of  V®  is,  which 
should  correlate  with  the  difficulty  of  separating  the  two  sets,  X'*',  X~  with  hyper¬ 
planes  (difficulty  being  defined  as  the  number  of  hyperplanes  required  to  separate 
the  sets).  For  the  purpose  of  defining  GC{X^),  “being  mixed  up”  is  mathematically 
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described  by  evaluating  the  intersections  of  the  convex  hulls  of  X'^  and  X~,  denoted 
co{X'^)  and  co{X~),  respectively.  This  is  done  iteratively,  breaking  the  classification 
problem,  i.e.  the  signed  set,  down  starting  on  the  outside  and  moving  inward.  The 
higher  the  value  of  GC,  the  more  difficult  the  problem. 

The  mapping,  GC,  is  constructed  from  three  mappings,  S,  h,  and  H. 

Define  the  mapping  5  :  +  {0, 1, 2}  as 

0  if  =  0 

S{X^)  =  <  1  if  X'^  —  0  xor  X~  =  0 
2  if  X+/0andX-^0. 

S{X^)  mathematically  clarifies  if  one  or  both  of  the  sets,  X~  are  empty.  If 
5'(X®)  =  0,  then  there  is  no  hyperplane  required  to  separate  the  empty  set.  If 
S{X^)  =  1,  then  no  hyperplane  is  required  to  implement  the  dichotomy  on  X^. 
However,  if  S(X^)  =  2,  more  analysis  is  required  to  determine  complexity.  So,  form 
the  convex  hulls  of  X"*"  and  X~  that  is  co(X'^)  and  co(X~).  If  co(X'^)  fl  co(X~)  =  0, 
then,  by  the  Hahn-Banach  Theorem,  the  set  is  separable  by  one  hyperplane  (31).  If 
co(X'*')  n  co(X~)  0,  then  analysis  of  the  convex  hulls  must  continue  in  order  to 
determine  the  complexity  within  co{X^)  fl  co{X~).  To  express  this  mathematically, 
a  binary  mapping  h  :  >  {0, 1}  is  required.  For  X^  G  X^ ,  define 

\  0  if  co(X+)nco(X-)  =  0 
h{X^)  =  -^ 

[  1  if  co(X+)nco(X-)  7^  0 

To  reduce  X^  to  the  portion  that  has  not  been  analyzed  define  H  :  X^  — >  X^  as 

H{X^)  ^  (x+n[co(x+)nco(A:-)],  x-n[co(A:+)nco(x-)]). 

If  X"  ^  (0,0),  then,  H'(X«)  ^  .H(0,0)  ^  (0,0).  Also,  since  H  :  X^  ^  X^,  the 
composition,  H  o  H,  is  well-defined.  For  clarity,  let  H'^  denote  H  composed  k  times. 
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Theorem  18  Let  X®  G  ,  then  there  exists  a  k  such  that  H^{X^)  =  (0,0). 

Proof.  This  proof  will  be  by  contradiction.  That  is,  for 

X®  G  Xf,  assume  ^  (0,0)  for  all  ^  G  N.  Since  X^  is  finite,  this  implies  that 

i?^(X*)  ^  for  some  k  =  ko.  Let  Z  ^  /f*o-^(X*).  Then,  H{Z)  ^  Z.  This 

implies 

Z+n[co(Z+)nco(Z-)]  =  Z+  and  Z"  fl  [c£)(Z+)  H  co(Z-)]  =  Z". 

Hence, 

co(Z"'")  n  co(Z“)  D  Z"*"  and  co(Z"'')  H  co(Z“)  D  Z~. 

So  that 

co(co(Z''")  n  co(Z“))  3  co(Z‘*')  and  co(co(Z+)  fl  co(Z~))  3  co(Z~). 

Therefore 

co(Z''')  n  co(Z~)  3  co(Z"'‘)  and  co(Z''‘)  D  co(Z“)  3  co(Z~), 

which  implies  co(Z'*‘)  =  co(Z“),  implying  that  Z"*"  D  Z“  7^  0.  This  is  a  contradiction. 
Hence,  there  must  exist  fc*  G  N  such  that  if**(X®)  =  (0,  0).  □ 

H{X^)  reduces  the  domain  of  the  analysis  to  the  portion  of  the  problem  that 
has  not  been  characterized  for  complexity.  In  this  manner,  the  problem  would  be 
picked  apart  and  values  of  /^(X®)  and  ^(X®)  would  be  accumulated  until  X®  is 
reduced  to  (0,  0).  This  process  is  housed  in  the  function  for  GC{X^). 

Finally,  the  geometric  complexity  of  a  finite  signed  set  X®  can  be  defined. 
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Definition.  The  geometric  complexity  oi  a  finite  signed  set  X®,  GC  :  — >•  Z+j  is 

defined  as 


GC{X^)  =  [5'(X®)  +  ft(X-)]  +  [5(^(X®))  +  Mif(jr®))]  + 

(S(/f(^(A:')))  +  h(H(H(X‘)))\  +  ...  (IT) 

=  Er=o  [5(/T‘(^*))  +  A(ff‘(A:*))] . 

Theorem  19  For  each  G  GC{X^)  <  oo.  Hence,  GC  is  defined  on  all  of 


Proof.  By  Theorem  18,  GC{X‘)  is  a  finite  sum  of  integers  and 
is,  therefore,  finite  for  all  X^  €  XC  □ 

7.1.2  Examples  of  GC.  Consider  the  following  examples  which  demon¬ 
strate  the  sensitivity  of  GC  to  different  geometric  situations.  The  first  set  of  examples 
demonstrates  that  GC{X^)  increases,  as  does  the  required  number  of  hyperplanes 
to  implement  the  dichotomy  irrespective  of  the  cardinality  of  X®. 

Example  1.  For  X®  ^  0®, 


G(7(X®)  =  5(X®) -1- /i(X®) 
=  O-hO 
=  0, 


since  Ff(0,0)  ^  (0,0). 

Example  2.  For  X®  S  (X+,  0),  or  X®  ^  (0,X-), 

GCiX^)  =  5(X®) -I- h(X®) 
=  l-fO 
=  1, 


again  since  =  (0,0). 
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Figure  6.  Example  4,  GC(X)=3. 


Example  3.  For  =  {X'^,X  ),  where  X'^  /  0  and  X  /  0,  but  co(X+)  D 
co{X~)  =  0,  then 

GC{X^)  =  S{X^)  +  h{X‘>) 

=  2  +  0 

=  2, 

and  again  -ff (0, 0)  =  (0, 0). 

These  three  examples  demonstrate  in  a  very  simple  way  how  the  geometric 
complexity  of  sets  is  determined. 

The  following  examples  wiU  help  explore  the  more  interesting  aspects  of  the 
mapping.  Consider  the  sets  and  GC  evaluation  pictured  in  Figures  6  thru  9. 

Example  4.  For  X^,  as  shown  in  Figure  6, 

GC{X^)  =  S{X^)  +  h{X^) 

=  2  +  1 

=  3. 
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Figure  7.  Example  5,  GC(X)=4. 


Example  5.  For  X®,  as  shown  in  Figure  7, 

GC{X^)  =  S{X^)-^h{X^)-^S{H{X^))  +  h{H{X^)) 
=  2  +  H-l  +  O 
=  4. 

Example  6.  For  X®,  as  shown  in  Figure  8, 

GCCX®)  =  S{X‘>)  +  h{X^)  +  S{H{X^))-^h{H{X^)) 
=  2  +  H-2  +  0 
=  5. 

Example  7.  For  X®,  as  shown  in  Figure  9, 

GC{X^)  =  ELo  [‘?(^"(^*))  + 

=  2  +  H-2  +  H-2  +  1 
=  9. 
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Figure  8.  Example  6,  GC(X)=5 


Figure  9.  Example  7,  GC(X)=9. 
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These  examples  demonstrate  sets  of  increasing  difficulty  for  ANN  implemen¬ 
tation  that  is  represented  in  the  increased  values  of  GC.  Once  again,  note  that,  in 
these  examples,  V-C  dimension  based  quantifiers  would  grow  disproportionately  to 
the  number  of  hyperplanes  required  to  separate  the  points.  This  is  simply  because 
V-C  analysis  would  be  completed  on  the  2"  colored  sets.  Moreover,  since  0-C  will  be 
defined  on  signed  sets,  a  comparison  of  the  values  of  V-C  dimension  and  the  Ox-Cart 
dimension  is  meaningless.  However,  a  comparison  of  the  mappings  of  V-C  dimension 
and  the  Ox-Cart  dimension  will  be  given  in  the  chapter  summary.  This  really  just 
establishes  that,  in  order  to  get  an  accurate  picture  of  what  is  required  of  an  ANN 
to  solve  a  problem,  V-C  based  quantifiers  may  be  ambiguous.  An  approach,  such 
as  GC,  may  be  more  indicative  of  the  true  hyperplane  arrangement  requirements  to 
implement  the  dichotomy  of  a  signed  set. 

From  these  examples,  it  can  be  seen  that,  in  some  aspects,  GC  is  cardinality 
insensitive.  For  instance,  the  value  of  GC  for  Examples  1  and  2  would  not  change 
if  the  cardinality  of  changed.  This  is  also  true  in  the  other  examples  unless 
the  set  was  changed  so  that  there  were  an  increased  number  of  convex  hull 
decomposition  iterations,  then  GC  would  also  change.  Also,  note  that  the  function 
GC  can  theoretically  be  applied  regardless  of  dimension.  However,  dimension  affects 
the  algorithms  used  to  compute  convex  hulls  and  intersections. 

7.1.3  Further  Analysis  of  GC .  It  would  seem  intuitive  that  another  map¬ 
ping  similar  to  GC  could  be  derived  mimicking  Sontag’s  use  of  the  Hausdorff  metric 
to  introduce  continuity.  However,  continuity  with  respect  to  the  Hausdorff  metric 
represents  a  function’s  sensitivity  to  changes  in  the  distance  between  the  points  in 
the  signed  set.  This  is  inappropriate  for  GC  since  it  is  valued  on  geometrical  concepts 
of  signed  sets.  That  is,  a  finite  signed  set’s  geometric  complexity  is  not  inherently 
dependent  on  the  points’  distances  between  each  other. 
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Figure  10.  Example  8,  the  XOR  problem. 


Now,  consider  three  examples  (illustrated  in  Figures  10, 11  and  12)  that  demon¬ 
strate  the  sensitivity  of  GC  with  respect  to  changes  in  the  geometry  of  the  signed 
sets.  GC  gives  ambiguous  results  for  Examples  8  and  9. 

Example  8.  The  classic  exclusive  or  (XOR)  problem  (Figure  10)  demonstrates  that 
as  the  cardinality  of  the  set  increases,  GC  wiU  also  increase.  However,  only  two 
hyperplanes  will  be  required  to  separate  the  points. 

Example  9.  For  Baum’s  example  of. the  alternating  n-gon  problem  (Figure  11), 
regardless  of  the  cardinality  of  the  set,  GC  will  always  be  2.  However,  the  required 
number  of  lines  increases  as  the  cardinality  of  the  set  increases  (see  Theorem  10). 

Example  10.  The  rings  problem  (Figure  12)  shows  that  cardinality  can  be  increased 
and  GC  will  remain  constant.  This  is  reflective  of  the  fact  that  the  number  of 
hyperplanes  required  wiU  not  grow  with  cardinality. 

It  would  appear  that  changing  GC  to  include  a  count  of  the  vertices  of  the 
convex  hulls  (the  cardinality  of  the  extremal  sets)  could  make  the  value  of  GC  more 
in  line  with  the  solution  to  the  n-gon  problem.  However,  the  GC  value  for  the  rings 
example  would  be  inappropriate.  In  conclusion,  in  order  to  build  a  GC  mapping  so 
that  it  gives  unambiguous  results  for  all  geometric  situations,  the  algorithm  would 
in  essence  have  to  determine  exactly  how  many  lines  are  required  and  where  to  place 
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Figure  11.  Example  9,  the  n-gon  problem. 
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Figure  12.  Example  10,  the  rings  problem. 
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Figure  13.  Example  11,  the  checkerboard  problem. 

them,  which  is  the  job  of  the  ANN  in  the  first  place.  So,  in  essence,  it  would  be 
defeating  the  whole  purpose  not  to  mention  being  computationally  unreasonable. 

Consider  one  additional  geometric  situation,  the  checkerboard  example. 

Example  11.  For  the  checkerboard  problem,  as  shown  in  Figure  13,  it  is  evident 
once  again  how  it  is  possible  for  GC  to  grow  unreasonably  as  the  cardinality  of  the 
set  grows. 

7.1.4  GC  is  an  Invariant.  Mathematically,  GC  is  a  mapping  from  a  signed 
set  to  a  nonnegative  integer  2;  €  Z+, 

GC  Z+. 

The  claim  is  that  GC  €  I®  for  any  A  6  A'^,  for  any  d.  To  proves  this,  GC  must  be 
shown  to  be  a  semi-measure  and  be  invariant  with  respect  to  D.^,  T^p,  and  Wa- 

Consider  the  following  lemma  which  establishes  that  GC  is  a  semi-measure. 
Lemma  12  GC  is  a  semi-measure,  not  a  measure. 


Proof.  By  definition,  GC{^^)  =  0  and  GC{X^)  >  0  for  all 

E  XP  Hence,  GC  is  a  semi-measure. 

However, 

for  a  finite  number  of  signed  sets  X^  £  XP  Consider  two  signed  sets  X(  and  X2  in 
such  that  both  can  be  separated  by  the  x-axes.  Moreover,  let  and  X2  be  such 
that  X^  and  Xf  lie  above  the  x-axes  and  Xf  and  Xf  lie  below  the  x-axes.  Then, 
GC(Xi)  =  GC(X2)  —  2.  However,  GC(Xi  U  X|)  =  2  (/  4).  □ 

GC  is  advertised  as  an  invariant.  What  exactly  is  invariant  about  it?  The 
hope  was  that  only  changes  in  the  complexity  of  the  geometry  of  a  signed  set  would 
affect  a  change  in  the  value  of  GG.  Hence,  it  would  be  expected  that  the  mapping 
would  be  invariant  to  translational,  rotational,  and  scale  variations  to  X^.  Consider 
the  following  lemmas. 

Lemma  13  GC(X^)  =  GC(T^,(X^))  for  all  X^  €  X^  and  xq  €  R'^. 

Proof.  Since  GC  is  simply  a  sum  of  integers  dictated  by 

the  mappings  S{X‘)  and  h{X^),  then  the  proof  is  reduced  to  showing  that  these 
mappings  are  invariant  for  any  signed  set.  Note  that  the  results  of  H  actually  do 
change.  However,  it  is  the  effect  on  S  and  h  that  matter  to  the  value  of  GC. 

Hence,  consider  5(Tj;o(X®)).  If  X®  =  0,  then  S{X^)  =  0.  Clearly,  Tj;o(0)  =  0 
and  5(T^o(0))  =  0. 

Similarly,  for  X®  G  X^  such  that  5(X®)  =  1,  then  X®  ^  (X+,0)  or  X®  ^ 
(0,X-).  Then,  S{T,,{X^))  =  5(T,„(X+),0)  =  1  or  5(T,„(X®))  =  5(0,Tro(X-))  = 

1. 

Finally,  for  X®  G  X^  such  that  5'(X®)  =  2,  then  co(X+)  n  co(X“)  7^  0.  There¬ 
fore,  co(T,<,(X+))  nco(r,„(X-))  ^  0.  Then,  5(T.„(X®))  =  5(r.„(X+),T.„(X-))  = 
2.  Hence,  5(X®)  =  5(T^o(X®)). 
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Now  consider  /i(Tj;o(X®)).  Then,  /i(Ti;o(X^))  =  h{Txo{X'^),Tx^{X  )).  Notice 

that 

co{Tx,{X+))nco{Tx,iX-))  =  Tx,{co{X+))nTx,{co{X-)) 

=  Txo{co(X+)nco{X-)) 

by  Lemma  8.  If  co(X'‘')  fl  co(X~)  =  0,  then  Txg{co{X'^)  fl  co{X~))  =  0,  and  if 
co{X+)nco{X-)  ^  0,  then  Tx^{co{X+)f\co{X-))  ^  0.  Hence,  h{X^)  =  /j(T^„(X^)). 

By  combining,  GC{X^)  =  G'C'(T,„(X*)).  □ 

Lemma  14  GC{X^)  =  GC'(D^(A'®))  for  all  X^  €  and  7  €  R'*’. 

Proof.  The  proof  follows  similar  to  Lemma  13  relying  on 

Lerruna  7.  D 

Lemma  15  GC{X^)  =  (?C(Wa(X*))  for  all  X*  €  and  A  € 

Proof.  The  proof  follows  similarly  as  Lemma  13  relying  on 

Lemma  9.  □ 

Enough  material  has  been  given  to  claim  that  GC  is  truly  an  invariant.  Con¬ 
sider  the  following  theorem. 

Theorem  20  GC  G  X®  with  respect  to  the  set  of  mappings  Af  ®. 

Proof.  By  Lemmas  13,  14,  and  15,  GC  G  X®.  That  is  GC  is  an 
invariant.  □ 

7.2  Definition  of  the  Ox-Cart  Dimension 

In  this  section,  the  Ox-Cart  dimension  will  be  defined  along  with  the  ordering 
induced  by  the  Ox- Cart  dimension.  It  will  be  shown  that  each  of  these  axe  instanti¬ 
ations  of  the  general  framework.  That  is  the  Ox-Cart  dimension  is  an  instantiation 
of  i/®  and  has  the  corresponding  ordering  facihtating  the  construction  of  a  lattice. 
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Hence,  through  the  Ox- Cart  dimension,  ANN  architectures  can  be  compared  or  or¬ 
dered  based  on  their  ability  to  handle  different  levels  of  geometric  complexity. 

In  accordance  to  the  general  framework  defined  in  Chapter  VI,  the  Ox-Cart 
dimension  will  be  defined  as  a  function  of  the  invariant  GC.  Let  A  €  A'*  for  some  d  € 
Z+.  Then,  the  Ox-Cart  dimension,  OC,  is  a  mapping  from  one  specific  arrangement 
to  a  positive  integer.  That  is, 

OC-.A'^-^  Z+. 

Let  X\  C  be  the  collection  of  signed  sets  which  can  be  dichotomized  by  A. 
Definition.  The  OC  dimension  of  A  is  defined  by 

OC{A)  =  sup{(?C(A:")  :  A®  G  Xi).  (18) 

This  is  a  quantifier  on  an  arrangement  of  hyperplanes  generated  by  an  ANN.  Define 
the  ordering,  of  arrangements  as  follows.  Let  Ai,  A2  €  A‘^.  Then,  Ai  -<00 

A2  if  and  only  if  OC(Ai)  <  OC(A2).  Note  that  this  is  not  antisymmetric.  Hence, 
:^oc  is  not  a  partial  ordering.  However,  in  the  following  section  a  partial  ordering 
is  defined  using  equivalence  classes. 

7.S  The  Lattice  of  Feed-forward,  Single  Hidden-Layer,  Perceptron  Artificial  Neural 
Networks  based  on  the  Ox-Cart  Dimension 

Recall  the  definition  of  F'f, 

=  k  =  ^]. 

Consider  the  following  definitions  of  equivalence  classes  defined  specifically  about 
the  Ox-Cart  dimension.  (Boldface  is  used  to  annotate  that  the  Ox-Cart  dimension 
is  being  defined  on  a  set  of  arrangements.) 
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Definition.  Given  A  €  define  the  equivalence  class  of  A,  denoted  [/I] ,  as 


[A]  =  {B  e  :  OC(A}  =  OC{B)}. 

Definition.  Given  define  the  collection  of  equivalence  classes  of  denoted 
as 

=  {[>1]  :  A  G 

Let  Xta  C  A! ^  he  the  set  of  signed  sets  which  can  be  implemented  by  some  A  G 
Then,  the  0-C  dimension  on  the  collection  of  arrangements,  can  be  defined  by 


OC{Ti)  =  sup{GG(J^0  :  e  -Ti  for  all  A  G  :F^}.  (19) 

The  ordering,  i^oC;  can  be  defined  on  the  collection  of  equivalence  classes  of  hyper¬ 
planes,  S(iF^)  as  follows.  Let  [A],  [B\  G  S{T^).  Then,  [A]  ::^oc  [B]  if  and  only  if 
OC([A])  <  OC([5]). 


Lemma  16  d^oc)  Is  a  partially  ordered  set  for  all  d  G  N. 


Proof.  By  Lemma  11. 


□ 


Structure  has  been  put  into  place  to  define  the  lattice  of  feed-forwaxd,  single 
hidden-layer,  perceptron  artificial  neural  networks  based  on  the  Ox- Cart  dimension. 
Consider  the  following  definitions  of  meet  and  join  on  Let  [A] ,  [B]  G 

Then, 


and 


[4aIb1=«;HM.1B1}  = 


|/1]v(b]  =  ;«hM,[B]}  = 


1  [^] 

if 

OC([.4])  <  OC([Bl) 

1  [B] 

if 

OC([B])  <  OC((/l|) 

'  [A] 

if 

OC(|Bl)  <  OC(lA]) 

if 

OC(|A])  <  OC([B|) 

Theorem  21  For  each  d,  k  G  N,  then  l^OC?  A,  V)  is  a  lattice. 


93 


Proof.  Let  </,  k  G  N,  then  :<oc)  is  a  partially  ordered 

set  by  Lemma  16.  By  the  definition  of  A  and  V,  is  closed  with  respect  to  A 

and  V.  Hence,  ::^oC5  A,  V)  is  a  lattice.  □ 

Consider  the  following  theorem. 

Theorem  22  /f  f ,  g  C  such  that  f  C  g,  then  f  ^oc  g  d  G  N. 

Proof.  Let  f,  g  C  A"^.  Assume  f  C  g.  Note  that  f  C  g  implies 

that  a/  C  Hence 

sup{G'C'(A:*)  :  A*  G  Ai,  V  A  G  f}  <  sup{GC'(A:*)  :  A"  G  V  A  G  g}. 

Therefore,  OC(f)  <  OC(g).  In  other  words,  f  :^oc  g  for  all  d  G  N.  □ 

In  summary,  what  has  been  provided  is  a  method  for  characterizing  ANN  ca¬ 
pabilities  that  will  allow  comparisons  of  architectures  based  on  the  geometric  com¬ 
plexity  of  signed  sets. 

7.S.1  An  Example  Using  i^oc-  Consider  the  following  example  which 
analyzes  the  difference  in  capability  of  ANNs  that  have  nonzero  valued  thresholds 
at  each  node  in  the  hidden-layer  and  those  that  have  zero  valued  thresholds. 

Consider  Define  the  subsets  fij,  g^  C  by 

=  I A  G  :  A  =  (J  /i  where  f  =  {x  ^  B?  :  Li{x)  =  u,-  •  x  =  0} 

I  t=i 

and 

K 

A  G  :  A  =  [J  where  /,•  =  {x  G  :  Li{x)  =  u,-  •  x  —  r,-  =  0} 

i=l 

Note  that  f^Ug^  =  Tfi  and  C  g«  for  all  «.  Therefore,  by  Theorem  22,  ::<oc  g^ 

for  all  K. 
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Hence,  for  given  /c,  the  set  of  ANNs  that  have  a  nonzero  threshold  can  solve 
a  classification  problem  with  a  higher  GC  value  than  the  set  of  ANNs  with  zero 
valued  thresholds.  This  is  an  intuitive  demonstration  of  the  use  of  the  ordering 
I^OC  •  Consider  Examples  4-7  presented  in  Section  7.1.2.  Note  that  as  the  GC 
value  increases  from  example  to  example,  to  implement  the  dichotomy  requires  an 
increasing  number  of  hyperplanes.  In  other  words,  k  must  increase.  Figures  14  and 
15  demonstrate  the  required  increase  in  hyperplanes  as  the  geometry  changes.  The 
conclusion  would  be  that  if  the  GC  value  of  a  signed  set  is  large,  it  may  be  more 
appropriate  to  use  an  ANN  with  threshold  values  at  each  node  in  the  hidden-layer 
even  though  thresholds  are  an  additional  parameter  which  must  be  learned. 

7.4  A  Comparison  of  the  Ox-Cart  Dimension  and  the  V-C  Dimension 

As  mentioned  previously,  it  is  not  appropriate  to  compare  the  values  of  the  Ox- 
Cart  Dimension  and  the  V-C  Dimension.  However,  the  mappings  can  be  compared 
in  the  context  of  the  general  framework  described  in  Chapter  VI.  Both  mappings 
are  functions  of  invariants.  Both  quantifiers  are  defined  based  on  invariants.  V-C 
dimension  is  defined  on  the  cardinality  invariant  which  is  defined  on  unsigned  sets. 
The  Ox- Cart  dimension  is  defined  on  the  invariant  geometric  complexity  which  is 
defined  on  signed  sets. 

Since  V-C  dimension  is  defined  on  arbitrary  arrangements  of  unsigned  sets,  V- 
C  dimension  can  only  characterize  the  ability  to  solve  the  worst  case  geometric 
arrangement.  However,  the  Ox-Cart  dimension  is  defined  more  specifically,  charac¬ 
terizing  ability  to  solve  any  arrangement  of  signed  sets  with  a  given  geometric 
complexity.  This  is  because  the  invariant,  geometric  complexity,  is  more  specific 
than  cardinality. 
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7.5  Conclusions 


In  summary,  this  chapter  has  defined  an  alternative  view  of  capabilities  analysis 
on  ANNs  using  the  generalized  framework  of  Chapter  VI.  The  Ox-Cart  dimension 
was  defined  based  on  an  invariant,  geometric  complexity.  It  was  shown  that  the 
Ox-Cart  dimension  induces  an  ordering  that  results  in  a  lattice.  This  lattice  is 
defined  as  the  lattice  of  feed-forward,  single  hidden-layer,  perceptron,  artificial  neural 
networks. 
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VIIL  Follow-On  Research 

An  obvious  extension  of  this  research  is  the  investigation  of  other  ANN  archi¬ 
tectures.  Combinatorial  Geometry  does  address  arrangements  of  structures  other 
than  hyperplanes.  In  fact,  Griinbaum  has  provided  mathematical  background  for 
projecting  arbitrary  curves  into  a  projective  space  where  they  behave  as  hyperplanes 
(21).  In  addition,  Zaslavsky  defines  the  geometric  lattice  L{A)  for  arrangements  of 
hyperplanes  in  the  projective  space  with  the  same  ordering  as  for  Euclidean  space. 
Hence,  we  have  the  same  structure  and  results  available  to  entities  which  can  be 
projected  into  the  Euclidean  space.  Thus  this  makes  the  research  in  this  dissertation 
applicable  in  a  more  general  setting. 

Moreover,  once  the  transformation  is  made  into  a  projective  space,  a  trade-off 
analysis  can  be  performed.  Applying  the  quantifiers  defined  by  i/  to  direct  learning 
algorithms  could  be  a  great  way  to  leverage  this  work.  In  particular,  the  quantifiers 
could  be  used  to  compare  the  power  of  adding  in  more  complex  boundaries,  i.e.  non¬ 
linear  decision  regions.  In  other  words,  the  trade-off  of  additional  ANN  parameters 
versus  ANN  capabilities  could  be  investigated.  This  extension  could  lead  to  more 
eflBcient  architectures  and  learning  algorithms  by  continuously  evaluating  the  benefit 
or  need  of  increasing  the  complexity  of  the  decision  boundaries. 

Also,  since  this  research  defined  the  cut-intersection  semi-lattice  for  ANN, 
much  can  be  discovered  about  the  behavior  and  structure  of  the  beast.  This  research 
addressed  specifically  the  area  of  determining  capabilities.  Estimates  of  computa¬ 
tional  efficiency  can  also  be  derived  in  the  framework  of  lattice  theory. 

Currently,  the  ANN  weight  space  (the  set  of  all  vectors  that  can  instantiate 
a  trained  ANN)  is  theoretically  assumed  to  be  continuous.  In  reality,  the  weight 
space  should  be  dependent  on  the  resolution  of  the  data.  By  decreasing  the  search 
space  to  a  given  resolution,  the  efficiency  of  the  ANN  should  increase  and  this  is 
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something  that  should  be  reflected  in  the  mapping  that  determines  the  capability. 
Baum  alludes  to  this  fact  in  his  1987  work  (9). 
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IX.  Summary 


The  intention  of  this  research  was  to  provide  a  broad  base  of  mathematics  on 
which  to  study,  measure,  and  improve  the  capabilities  of  artificial  neural  networks. 
This  was  accomplished  for  a  specific  set  of  architectures:  feed-forward,  single  hidden- 
layer,  perceptron  ANNs.  The  approach  taken  for  characterizing  capabilities  was  very 
different  from  any  method  described  in  current  literature.  The  resultant  method 
exhibits  new  properties  that  the  lack  hereof  has  proved  to  be  the  downfall  of  other 
methods.  Along  the  way,  interesting  extensions  of  old  methods  were  formalized  and 
important  links  were  made  between  old  methods  and  the  new  approach  proposed  in 
this  research. 

The  approach  taken  to  accomphsh  the  task  was  to: 

1.  Understand  and  clarify  current  methods; 

2.  Determine  what  mathematical  structure  an  ANN  exhibits,  for  example, 
geometrically,  a  cut-intersection,  semi-lattice  was  established; 

3.  Propose  new  methods  and  demonstrate  their  usefulness. 

The  premise  was  that  measures  based  on  V-C  dimension  concepts  lacked  prop¬ 
erties  that  allowed  for  a  meaningful  assessment  of  capabilities.  The  notion  of  shat¬ 
tering  is  a  hard,  strict  requirement  that  actually  only  measures  an  ANN’s  ability 
to  accomphsh  the  absolute  worst  case  of  a  geometric  arrangement  of  data  points. 
Moreover,  the  requirement  of  there  being  only  at  least  one  arrangement  which  can 
be  shattered  further  limits  the  ability  to  generalize  capabilities  over  all  possible  ge¬ 
ometric  situations.  The  reason  for  this  limited  approach  was  that  there  has  been 
httle  mathematical  structure  afforded  to  ANN  architectures  formally.  When  in  fact 
there  is  a  rich  set  of  mathematics  (specifically,  lattice  theory  and  invariant  analysis 
derived  from  combinatorial  geometry)  whose  results  could  be  apphed  directly,  once 
an  ANN  architecture  structure  is  in  place. 
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The  study  of  current  attempts  to  measure  capabilities  (Cover,  Baum  and  Son- 
tag)  resulted  in  important  extensions  to  their  work.  By  clarifying  the  underlying 
mathematics  of  their  approaches,  closed  form  formulas  were  derived  for  evaluat¬ 
ing  their  measures,  where  before  there  were  only  estimates.  However,  when  these 
concepts  were  attempted  to  be  generalized,  the  whole  premise  of  the  approach  was 
found  to  be  too  limiting.  Therefore,  the  search  for  an  alternative  view  through  com¬ 
binatorial  geometric  ideas  for  measuring  capabilities  resulted.  The  following  is  a 
compilation  of  the  contributions  made  by  this  research. 

1.  In  order  to  develop  the  alternative  approach  of  measuring  ANN  capabilities,  the 
lattice  structure  of  ANN’s  was  established.  It  was  found  that  a  cut-intersection 
semi-lattice  could  be  defined  for  ANN’s.  This  result  is  formalized  in  Theorem 
12. 

2.  The  set  of  chambers  produced  by  ANNs  was  shown  to  be  a  semi-lattice.  This 
is  detailed  in  Theorem  15. 

3.  Studying  the  combinatorial  geometry  of  ANNs  produced  a  formula  for  the  V-C 
dimension.  It  was  established  that  the  V-C  dimension  can  be  reformulated  as 
the  well-known  geometric  invariant  of  a  hyperplane  arrangement,  cardinality 
of  the  set  of  chambers.  This  is  detailed  in  Theorem  14. 

4.  Using  the  concepts  of  invariant  analysis  and  lattice  theory,  a  generalized  frame¬ 
work  in  which  capability  analysis  can  be  performed  was  described.  This  re¬ 
sulted  in  a  generalized  lattice  structure  established  by  Theorem  17.  The  gen¬ 
eralized  structure,  based  on  invariant  analysis  includes  a  quantifier,  partial 
ordering,  and  resulting  lattice. 

5.  A  new  quantifier  of  ANN  capability  is  defined,  called  the  Ox-Cart  dimension. 
Theorem  21  shows  that  the  Ox-Cart  dimension  can  be  used  to  define  the  lattice 
of  ANNs. 
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6.  Moreover,  the  Ox-Cart  dimension  is  based  on  geometric  invariance  which  is 
shown  to  be  well-suited  for  ANN  capability  analysis  since  it  is  directed  specif¬ 
ically  at  signed  sets  which  represent  arbitrary  classification  problems. 

9.1  Conclusions 

In  summary,  this  document  presents  an  alternative  perspective  for  analyzing 
the  capability  of  ANNs.  The  important  points  are  that  capability  analysis  should  be 
viewed  through  a  generalized  framework  describing  the  invariant  nature  of  specific 
problems.  That  is,  the  nature  of  signed  sets  should  be  exploited  to  yield  a  character¬ 
ization  of  separability  which  can  readily  be  translated  to  characterize  requirements 
of  an  ANN  architecture  designed  to  separate  the  signed  set.  To  facilitate  this  per¬ 
spective,  mathematical  structure  about  ANNs  is  established.  This  structure  is  used 
to  define  an  invariant  called  the  geometric  complexity  about  signed  sets.  This  is 
used  to  define  the  new  ANN  capability  quantifier  called  the  Ox-Cart  dimension. 
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